User profiles matching in social networks 3
Data Collecting. Our approach consists of se veral stages. At first, we must data
from two social media using a crawling framework (profiles, photos from albums
and posts) [1]. For the purposes of validation of our results, we collect a set of
profiles from VKontakte, which have an explicit link to their seconda ry profile
in I nstagram - the only possible way to build the labelled dataset.
Face Detection and Embedding. We proces s photos using two algorithms:
1. face detection - we apply MTCNN - Multi-task Cascaded Convolutional
Networks [11], w hich achieved efficienc y superior to the closest comp e tito rs
and is not affected by scaling of the faces;
2. face embedding - to construct embeddings of extracted faces FaceNet neural
network is applied [7].
We apply MTCNN pre-trained on the WIDER FACE dataset and FaceNet pr e-
trained on the VGGFace2
1
. Then this data is filtered.
Filtering. The extracted face embeddings are further filtered by their parameters
according to several heuristics:
1. filtering by number of pixels (hereinafter, we will use the term quality of the
image);
2. filtering by anchors (child faces removing).
FaceNet has limitations on the minimum re quired quality of images a nd
we filter images of faces by the numbe r of pixels of these faces . The accurate
control of the above parameters allows to achieve an improved precision and
recall o f matching, this is partly due to the behaviour of the s e lected method for
embedding construction. In the experimental study in Sect. 4 we found an effect
of the quality of facial images on the final matching efficie nc y - it improves the
F1-score by 4%.
The other heuristics probably can be related to the dataset limitation o f
VGGFace2 with which FaceNet was trained. VGGFace2 contains young and
mature faces of people but does not contain the faces of babies and small children.
This leads to a pro ble m that embeddings of child’s faces have a very small margin
betwee n each other. That is why we should remove their faces from the user’s
collection of photos to avoid mismatching of profiles. Figure 1 reveals that the
distribution of distances between embeddings of children’s faces has a bia s fro m
the distribution of distances between embeddings of random people’s faces.
Additional filtering of data is accomplished using so-called anchors. An an-
chor is a vector that represents some space of embedded faces. In our s tudy, we
use the anchor to represent the faces of children. We create it by following way. A
set of children faces was collected semi-automatically: we find kindergarten and
photogra phers accounts using tags and specific usernames. For instance, tags
under the photos with words ”children”, ”kindergarten”, etc. Then we build an
anchor - element-wise mean of a ll vectors of children’s faces. All face embeddings
which are clo se to this anchor are removed from the dataset.
1
Code repository used - https://github.com/davidsandberg/facenet