User profiles matching in social networks 3
Data Collecting. Our approach consists of se veral stages. At first, we must data
from two social media using a crawling framework (profiles, photos from albums
and posts) [1]. For the purposes of validation of our results, we collect a set of
profiles from VKontakte, which have an explicit link to their seconda ry profile
in I nstagram - the only possible way to build the labelled dataset.
Face Detection and Embedding. We proces s photos using two algorithms:
1. face detection - we apply MTCNN - Multi-task Cascaded Convolutional
Networks [11], w hich achieved efficienc y superior to the closest comp e tito rs
and is not affected by scaling of the faces;
2. face embedding - to construct embeddings of extracted faces FaceNet neural
network is applied [7].
We apply MTCNN pre-trained on the WIDER FACE dataset and FaceNet pr e-
trained on the VGGFace2
. Then this data is filtered.
Filtering. The extracted face embeddings are further filtered by their parameters
according to several heuristics:
1. filtering by number of pixels (hereinafter, we will use the term quality of the
2. filtering by anchors (child faces removing).
FaceNet has limitations on the minimum re quired quality of images a nd
we filter images of faces by the numbe r of pixels of these faces . The accurate
control of the above parameters allows to achieve an improved precision and
recall o f matching, this is partly due to the behaviour of the s e lected method for
embedding construction. In the experimental study in Sect. 4 we found an effect
of the quality of facial images on the final matching efficie nc y - it improves the
F1-score by 4%.
The other heuristics probably can be related to the dataset limitation o f
VGGFace2 with which FaceNet was trained. VGGFace2 contains young and
mature faces of people but does not contain the faces of babies and small children.
This leads to a pro ble m that embeddings of child’s faces have a very small margin
betwee n each other. That is why we should remove their faces from the user’s
collection of photos to avoid mismatching of profiles. Figure 1 reveals that the
distribution of distances between embeddings of children’s faces has a bia s fro m
the distribution of distances between embeddings of random people’s faces.
Additional filtering of data is accomplished using so-called anchors. An an-
chor is a vector that represents some space of embedded faces. In our s tudy, we
use the anchor to represent the faces of children. We create it by following way. A
set of children faces was collected semi-automatically: we find kindergarten and
photogra phers accounts using tags and specific usernames. For instance, tags
under the photos with words ”children”, ”kindergarten”, etc. Then we build an
anchor - element-wise mean of a ll vectors of children’s faces. All face embeddings
which are clo se to this anchor are removed from the dataset.
Code repository used -