Gao J, et al. Sci China Inf Sci March 2015 Vol. 58 032104:3
instances into OBDD and d-DNNF respectively, which shows that these languages suffer the easy–hard–
easy pattern, and the sizes of compilation results increase as the ratio of clauses to variables grows from 0
but decrease when the ratio exceeds a certain point. Also, we observe a phase transition of polynomial–
exponential increment of sizes when compiling instances with very small ratio of #clauses to #variables.
Additionally, we analyze a phase transition phenomenon on prime implicants and point out that average
length of prime implicants is nearly half of the total number of variables around the peak point of the
easy–hard–easy pattern.
2 Background of random k-SAT model
Many academic and real-world problems can be transformed into k-SAT problems and solved by efficient
SAT solvers. However, it is well known that k-SAT is NP-complete for k>2, so designing search
algorithms to solve SAT problems efficiently is an essential issue in artificial intelligence. To evaluate the
efficiency of SAT algorithms, Mitchell et al. [30] introduced a model for generating random SAT instances,
which can produce hard instances as it was claimed. Compared with instances derived from industrial
areas, this model can obtain large amount of instances whose size and difficulty can be controlled by
parameters. As a result, it has become a canonical benchmark to test various SAT algorithms. Meanwhile,
theoretical analyses on the random k-SAT model have been made [3]. For example, the properties of
phase transitions have been identified that can be used to analyze average computational complexity.
In the SAT model discussed in [30], there are three parameters k, n,andm that are employed to
generate SAT instances, where k is the length of clauses, n is the number of variables, and m is the
number of clauses. We use a ratio r to represent m/n. Instances are produced by generating m clauses
uniformly and independently, where each clause is generated by selecting k variables without replacement
from n variables and negating each variable with probability 0.5, so all clauses generated are of the same
length.
Empirical and theoretical studies have demonstrated that there are easy–hard–easy patterns when
solving the random k-SAT instances, regardless of the algorithm used. Lots of experiments have shown
that the soluble-to-insoluble phase transition will occur when r is up to a fixed value. For example,
when r is about 4.3 in 3-SAT [30], mean runtimes spent on finding a solution by SAT algorithms are
often extremely high because the instances generated are in the phase transition region. As r increases
from 0, mean runtimes increase first but decrease when r is larger than the value at the transition
point. However, it seems to be difficult to locate exact positions of the transition points in k-SAT by
theoretical proof, as only lower and upper bounds have been found for the positions [3]. On the other
hand, similar phenomenon also exists in counting problems (e.g., #SAT [19] and #CSP [31]). In the
general case, #SAT is #P-complete, so it is considered to be harder than SAT problems, and algorithms
for solving this problem should count the number of solutions in a given SAT instance. Mean runtimes
of several #SAT solvers have been investigated, and the results show that those solvers also suffer the
easy–hard–easy pattern, but the hardest regions of those #SAT algorithms are different. For instance,
it was reported that an algorithm, called Counting Davis-Putnam [32], reaches its peak runtime when
r =1.2 in the experiments of #3-SAT, but the peak point of another algorithm, called Decomposing
Davis-Putnam [33], occurs when r =1.5.
In the following sections, we will show that the knowledge complication also undergoes the easy–hard–
easy pattern.
3Basicconjecture
Let X
L
(P ) be the size of the compilation result, where L is the target language to which the SAT instance
P is compiled (we regard the number of nodes as the size approximately, when L is OBDD or d-DNNF).
Definition 1. Given n, r, k and a target language L, average size E(X
L
(P
nr
k
)) is defined as follows:
E(X
L
(P
nr
k
)) = (ΣX
L
(P
nr
k
))/|P
nr
k
| (1)