SuperGlue: Learning Feature Matching with Graph Neural Networks
Paul-Edouard Sarlin
1∗
Daniel DeTone
2
Tomasz Malisiewicz
2
Andrew Rabinovich
2
1
ETH Zurich
2
Magic Leap, Inc.
Abstract
This paper introduces SuperGlue, a neural network that
matches two sets of local features by jointly finding corre-
spondences and rejecting non-matchable points. Assign-
ments are estimated by solving a differentiable optimal
transport problem, whose costs are predicted by a graph
neural network. We introduce a flexible context aggregation
mechanism based on attention, enabling SuperGlue to rea-
son about the underlying 3D scene and feature assignments
jointly. Compared to traditional, hand-designed heuris-
tics, our technique learns priors over geometric transforma-
tions and regularities of the 3D world through end-to-end
training from image pairs. SuperGlue outperforms other
learned approaches and achieves state-of-the-art results on
the task of pose estimation in challenging real-world in-
door and outdoor environments. The proposed method per-
forms matching in real-time on a modern GPU and can
be readily integrated into modern SfM or SLAM systems.
The code and trained weights are publicly available at
github.com/magicleap/SuperGluePretrainedNetwork.
1. Introduction
Correspondences between points in images are essential
for estimating the 3D structure and camera poses in geo-
metric computer vision tasks such as Simultaneous Local-
ization and Mapping (SLAM) and Structure-from-Motion
(SfM). Such correspondences are generally estimated by
matching local features, a process known as data associa-
tion. Large viewpoint and lighting changes, occlusion, blur,
and lack of texture are factors that make 2D-to-2D data as-
sociation particularly challenging.
In this paper, we present a new way of thinking about the
feature matching problem. Instead of learning better task-
agnostic local features followed by simple matching heuris-
tics and tricks, we propose to learn the matching process
from pre-existing local features using a novel neural archi-
tecture called SuperGlue. In the context of SLAM, which
typically [
7] decomposes the problem into the visual fea-
ture extraction front-end and the bundle adjustment or pose
estimation back-end, our network lies directly in the middle
– SuperGlue is a learnable middle-end (see Figure
1).
Super
Glue
v8
Detector & Descriptor
Deep Front-End
SuperGlue
Back-End Optimizer
Deep Middle-End Matcher
Figure 1: Feature matching with SuperGlue. Our ap-
proach establishes pointwise correspondences from off-the-
shelf local features: it acts as a middle-end between hand-
crafted or learned front-end and back-end. SuperGlue uses a
graph neural network and attention to solve an assignment
optimization problem, and handles partial point visibility
and occlusion elegantly, producing a partial assignment.
In this work, learning feature matching is viewed as
finding the partial assignment between two sets of local
features. We revisit the classical graph-based strategy of
matching by solving a linear assignment problem, which,
when relaxed to an optimal transport problem, can be solved
differentiably. The cost function of this optimization is pre-
dicted by a Graph Neural Network (GNN). Inspired by the
success of the Transformer [
55], it uses self- (intra-image)
and cross- (inter-image) attention to leverage both spatial
relationships of the keypoints and their visual appearance.
This formulation enforces the assignment structure of the
predictions while enabling the cost to learn complex pri-
ors, elegantly handling occlusion and non-repeatable key-
points. Our method is trained end-to-end from image pairs
– we learn priors for pose estimation from a large annotated
dataset, enabling SuperGlue to reason about the 3D scene
and the assignment. Our work can be applied to a variety of
multiple-view geometry problems that require high-quality
feature correspondences (see Figure
2).
∗
Work done at Magic Leap, Inc. for a Master’s degree. The author thanks
his academic supervisors: Cesar Cadena, Marcin Dymczyk, Juan Nieto.
4938