Scaling Simultaneous Optimistic Optimization for High-Dimensional
Non-Convex Functions with Low Effective Dimensions
∗
Hong Qian and Yang Yu
National Key Laboratory for Novel Software Technology,
Nanjing University, Nanjing 210023, China
{qianh,yuy}@lamda.nju.edu.cn
Abstract
Simultaneous optimistic optimization (SOO) is a re-
cently proposed global optimization method with a
strong theoretical foundation. Previous studies have
shown that SOO has a good performance in low-
dimensional optimization problems, however, its per-
formance is unsatisfactory when the dimensionality is
high. This paper adapts random embedding to scaling
SOO, resulting in the RESOO algorithm. We prove that
the simple regret of RESOO depends only on the ef-
fective dimension of the problem, while that of SOO
depends on the dimension of the solution space. Em-
pirically, on some high-dimensional non-convex testing
functions as well as hyper-parameter tuning tasks for
multi-class support vector machines, RESOO shows sig-
nificantly improved performance from SOO.
Introduction
Problem.
Solving sophisticated optimizations is in the cen-
tral problems of artificial intelligence. Let
f : X→R
be a
function defined on a bounded region
X⊆R
D
, of which we
assume that a global maximizer
x
∗
∈X
always exists. An
optimization problem can be formally written as
x
∗
=argmax
x∈X
f(x).
Without loss of generality, we assume
X =[−1, 1]
D
in
this paper. We treat
f
as a black-box function that can only
be evaluated point-wisely, i.e., we can only access
f(x)
for
any given solution
x ∈X
. We assume that
f
is deterministic,
i.e., every call of
f(x)
returns the same value for the same
x
.
The performance of an optimization algorithm is evaluated
by the simple regret (Bubeck, Munos, and Stoltz 2009), i.e.,
given n function evaluations, for maximization,
r
n
=max
x∈X
f(x) − f(x(n)),
where
x(n) ∈X
is the solution with the highest function
value found by the algorithm when the budget of
n
function
evaluations is used up. The simple regret measures the differ-
ence between the true maximum of
f
and the best found by
the algorithm. An algorithm is no-regret if it possesses the
desirable asymptotic property lim
n→∞
r
n
=0.
∗
This research was supported by the NSFC (61375061,
61422304, 61223003), Foundation for the Author of National Ex-
cellent Doctoral Dissertation of China (201451).
Copyright
c
2016, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Related Work.
Methods with different principles have
been proposed to address the black-box global optimization
problems. Most of them can be roughly categorized into three
kinds: meta-heuristic search, deterministic Lipschitz opti-
mization methods, and Bayesian optimization methods. Meta-
heuristic search algorithms are designed with inspired heuris-
tics, such as evolutionary strategies (Hansen, M
¨
uller, and
Koumoutsakos 2003) , which, however, are very weak in their
theoretical foundations. Deterministic Lipschitz optimization
methods require Lipschitz continuity assumption on
f
, either
globally (Pint
´
er 1996; Kearfott 1996; Strongin and Sergeyev
2000) or locally (Kleinberg, Slivkins, and Upfal 2008;
Bubeck et al
.
2011; Jones, Perttunen, and Stuckman 1993;
Bubeck, Stoltz, and Yu 2011; Slivkins 2011; Munos 2011),
which can have sound theoretical foundations. In addition,
when a function evaluation is very expensive, Bayesian op-
timization methods (Brochu, Cora, and de Freitas 2010;
Snoek, Larochelle, and Adams 2012) are particularly suitable,
which are often theoretically supported under the assumption
of Gaussian process priors.
We are interested in the deterministic Lipschitz optimiza-
tion methods, and particularly the recently proposed Simul-
taneous Optimistic Optimization (SOO) algorithm (Munos
2011; 2014), since the Local Lipschitz continuity that SOO
requires is intuitive, easy to be satisfied, and relatively easy
to be verified. SOO incorporates an optimistic estimation of
the function value with the branch-and-bound principle. It
is worth mentioning that, although SOO assumes the Local
Lipschitz continuity with respect to some semi-metric
, fortu-
nately
does not need to be known. Because of these advan-
tages, SOO has attracted attentions, such as the hybrid with
Bayesian optimization to eliminate the optimization of acqui-
sition functions (Wang et al
.
2014). Also, variants of SOO
have been proposed for, e.g., stochastic optimization (Valko,
Carpentier, and Munos 2013) and parallel optimistic opti-
mization (Grill, Valko, and Munos 2015).
However, previous studies have noticed that SOO may
perform poorly in high-dimensional optimization prob-
lems (Preux, Munos, and Valko 2014; Derbel and Preux
2015). Meanwhile, it has been observed that in a wide range
of high-dimensional optimization problems, such as hyper-
parameter optimization in neural networks (Bergstra and
Bengio 2012), intrinsically only a few low dimensions are
effective. For these optimization problems with low effective
2000
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)