Automated Algorithm Selection: Survey and Perspectives
problems, such as maximum satisability (MaxSAT) problem, in which the objective is to
nd a variable assignment that maximises the number of satised CNF clauses.
The rst large collection of features for SAT (and thus also MaxSAT) instances was
provided by Nudelman, Leyton-Brown, Hoos et al. (2004). Despite the rather simple
structure of SAT instances, the authors devised nine different feature sets and a total of
91 features, which characterise a given CNF formula from a multitude of perspectives.
Eleven problem size features describe SAT instances based on summary statistics of their
numbers of clauses and variables. A set of variable-clause graph (VCG) features comprises
ten node degree statistics based on a bipartite graph over the variables and clauses of
a given instance. Interactions between the variables are captured by four variable graph
(VG) features; these are the minimum, maximum, mean, and coefcient of variation of
the node degrees for a graph of variables, in which edges connect pairs of variables that
jointly occur in at least one clause. Similarly, the set of clause graph (CG) features con-
tains seven node degree statistics of a graph whose edges connect clauses that have at
least one variable in common, as well as three features based on weighted clustering
coefcients for the clause graph. Thirteen balance features capture the balance between
negated and unnegated variables per clause, their overall occurrences across all clauses,
as well as fractions of unary, binary, and ternary clauses, whereas six further features
quantify the degree to which the given F resembles a Horn formula (a restricted type
of CNF formula, for which SAT can be decided efciently). The solution of a linear pro-
gram representing the given SAT instance provides the basis for six LP-based features.
Finally, there are two sets of so-called probing features, which are based on performance
statistics over short runs of several well-known SAT algorithms (based on DPLL and
stochastic local search, two prominent approaches to solving SAT) and capture the de-
gree to which these make early progress on the given instance.
Some of the feature sets—specically, the CG, VG, and LP-based features, as well
as some of the VCG, balance and DPLL-probing features—are computationally quite
expensive (see, e.g., Xu et al., 2008; Hutter, Hoos et al., 2014) and consequently not al-
ways useful in the context of practical algorithm selection approaches. Similarly, the
algorithm runs for probing features are limited to a very small part of the overall time
budget for solving a given instance, to make sure that sufcient time remains available
for running the selected SAT solver.
A decade later, Hutter, Hoos et al. (2014)—building on the work by Nudelman,
Leyton-Brown, Hoos et al. (2004)—introduced a set of 138 SAT features. While they
removed some features from the earlier sets, much of the set remained the same. The
most signicant changes were an extension of the CG and VG feature sets by ve new
features each, as well as three new feature sets accounting for an additional 48 features.
The VG feature set was extended by so-called diameter features, which capture statis-
tics based on the set of longest shortest paths from one variable to any other one in the
graph. Also, instead of the weighted clustering coefcients based on the CG (as done
by Nudelman, Leyton-Brown, Hoos et al., 2004), Hutter, Hoos et al. (2014) used a set
of clustering coefcients that measure the CG’s “local cliqueness.” Furthermore, they in-
troduced 18 novel clause learning features, which summarise information gathered dur-
ing short runs of a prominent SAT solver, _, that learns conict clauses
during its search for a satisfying assignment (Mahajan et al., 2004). Another 18 fea-
tures are derived from estimates of variable bias obtained from the SAT solver VARSAT
(Hsu and McIlraith, 2009); these features essentially capture statistics over estimates
for the probability for variables to be true, false,orunconstrained in every satisfying as-
signment. Finally, Hutter, Hoos et al. (2014) proposed to use the actual feature costs,
Evolutionary Computation Volume 27, Number 1 11