Local Difference Binary for Ultrafast
and Distinctive Feature Description
Xin Yang, Member, IEEE,and
Kwang-Ting (Tim) Cheng, Fellow, IEEE
Abstract—The efficiency and quality of a feature descriptor are critical to the user
experience of many computer vision applications. However, the existing
descriptors are either too com putationally expensive to achieve real-time
performance, or not sufficiently distinctive to identify correct matches from a large
database with various transformations. In this paper, we propose a highly efficient
and distinctive binary desc riptor, called local difference binary (LDB). LDB directly
computes a binary string for an image patch using simple intensity and gradient
difference tests on pairwise grid cells within the patch. A multiple-gridding strategy
and a salient bit-selection method are applied to capture the distinct patterns of the
patch at different spatial granularities. Experimental results demonstrate that
compared to the existing state-of-the-art binary descriptors, primarily designed for
speed, LDB has similar construction efficiency, while achieving a greater accuracy
and faster speed for mobile object recognition and tracking tasks.
Index Terms—Binary feature descriptor, mobile devices, object recognition,
tracking, augmented reality
Ç
1INTRODUCTION
FEATURE point descriptors are widely used in many computer
vision tasks such as marker-less augmented reality (AR) [10], [11],
simultaneous localization and mapping (SLAM) [16], and image
retrieval [19], [21]. Their broad applications have driven the
development of a plethora of descriptors [1], [2], [3], [4], [5], [6], [7],
[8]. However, as the application requirements are increasingly
demanding, for example, handling larger databases and/or
running real time on handheld devices, the demand for more
advanced descriptors is stronger than ever.
An ideal descriptor should achieve two competing goals: high-
quality description and low computational complexity. High-
quality descriptors capture the most representative information in
an image, such that different image content can be distinguished
(i.e., high distinctiveness) and the same content subject to various
image distortions can be recognized (i.e., high robustness). High-
speed descriptors enable the entire application task to run in real
time at a sufficiently high frame rate.
Many research efforts have been made to achieve either strict
quality requirements or low computational speed. The SIFT
descriptor [2], proposed over a decade ago, has been widely
adopted as one of the highest quality options. However, it imposes
a heavy computation burden. This drawback has drawn extensive
efforts [1], [3] for optimizing its speed without compromising its
quality too much. Among the enhancements, SURF [1] is arguably
the most noticeable. But recent experiments [22] have shown that
the SURF descriptor is still too computationally heavy; thus only a
limited number of points can be handled for real-time applications
such as AR, especially for handheld devices such as smartphones
and tablets. On the other end of the spectrum aiming primarily at
fast runtime, lightweight binary descriptors such as BRISK [6],
FREAK [7], BRIEF [4], and its variant rBRIEF (or ORB descriptor)
[5] have become increasingly popular as they are very efficient to
store and to match (simply computing the Hamming distance
between descriptors via XOR and bit-count operations). These
runtimeadvantagesmakethemmoresuitableforreal-time
applications and handheld devices. However, these binary
descriptors utilize overly simplified information, i.e., raw inten-
sities of a subset of pixels within an image patch for binary tests,
and thus have low discriminative ability. Lack of distinctiveness
incurs an enormous number of false matches when matching
against a large database. Expensive postverification methods (e.g.,
RANSAC [23] or PROSAC [13]) are usually required to discover
and validate matching consensus, increasing the runtime of the
entire process.
In this paper, we introduce a new binary descriptor, named
local difference binary (LDB), which achieves similar computa-
tional speed and robustness as the state-of-the-art binary descrip-
tors [4], [5], [6], [7], yet offering much higher distinctiveness
compared to them. The high quality of LDB is achieved through
three schemes. First, LDB utilizes both average intensity I
avg
and
first-order gradients, d
x
and d
y
, of grid cells within an image patch.
Specifically, the internal patterns of the image patch are captured
through a set of binary tests, each of which compares the I
avg
, d
x
and d
y
of a pair of grid cells (see Figs. 1a and 1b). The average
intensity and gradients provide a more complete description than
other binary descriptors. Second, LDB employs a multiple-
gridding strategy to encode the structure at different spatial
granularities (see Fig. 1c). Coarse-level grids can cancel out high-
frequency noise, while fine-level grids can capture detailed local
patterns, thus enhancing distinctiveness. Third, LDB leverages a
modified AdaBoost method to select a set of salient bits. The
modified AdaBoost targets the fundamental goal of ideal binary
descriptors: minimizing distances between matches while max-
imizing them between mismatches, optimizing the performance of
LDB for a given descriptor length. Computing LDB is extremely
fast. Relying on integral images, the average intensity and first-
order gradients of each grid cell can be obtained by only 4-8 add/
subtract operations.
Our experimental results demonstrate that the construction
speed of LDB is much faster than that of SURF and is comparable
to those of the state-of-the-art binary descriptors, including ORB,
BRISK, and FREAK, while the robustness and distinctiveness of
LDB is higher than these descriptors.
The remainder of this paper is organized as follows: Section 2
reviews the related work. Section 3 presents details of the
proposed descriptor. In Sections 4 and 5 we compare performance
of LDB with the state-of-the-art descriptors on public benchmarks
and evaluate its speed, robustness, and discriminative power for
mobile applications. Section 6 concludes the paper.
2RELATED WORK
SIFT is currently among the best quality descriptors in the
literature. It relies on local gradient histograms and represents an
image patch using a 128D real-value vector. Despite its high
descriptive power and robustness to a variety of image transfor-
mations, the intensive computations for obtaining gradients and
the high dimensionality of SIFT make it prohibitively slow to
compute and match, especially on low-power devices. PCA-SIFT
[3] reduced the descriptor from 128D to 36D to reduce the
matching cost, whereas the increased time for descriptor formation
almost annihilates the increased speed of matching. To date, SURF
descriptor is considered as the most popular replacement for SIFT.
It greatly accelerates the gradient computations using integral
images as [14], while almost preserving the quality of SIFT.
188 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 1, JANUARY 2014
. The authors are with the Electrical and Computer Engineering Depart-
ment, University of California, Santa Barbara, Harold Frank Hall, Rm
4109, Santa Barbara, CA 93106-9560.
E-mail: xinyang@umail.ucsb.edu, timcheng@ece.ucsb.edu.
Manuscript received 9 Sept. 2012; revised 7 May 2013; accepted 21 July 2013;
published online 13 Aug. 2013.
Recommended for acceptance by T. Tuytelaars.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2012-09-0713.
Digital Object Identifier no. 10.1109/TPAMI.2013.150.
0162-8828/14/$31.00 ß 2014 IEEE Published by the IEEE Computer Society