[Udell et al
.
2014]. These approaches reliably solve modest size
problems, with on the order of
10, 000
s of variables, but for image
optimization problems with millions of variables these solvers be-
come infeasible due to their memory and computational cost. There
have been several different approaches towards making an optim-
ization DSL or framework that can handle large problems such as
occur in image optimization. The approach in [Diamond and Boyd
2015] extends CVXPY to recognize and exploit fast linear transforms,
such as convolution and the discrete Fourier transform. The Epsilon
framework takes advantage of fast proximal operators for individual
functions, transforming problems so they can be efficiently solved by
a variant of ADMM [Wytock et al
.
2015]. The TFOCS framework
makes it easy to apply a variety of proximal and first order algorithms
to optimization problems, and accommodates fast linear transforms
[Becker et al
.
2011]. None of these systems can compete with ex-
isting specialized solvers for individual image processing problems,
however, and they are also limited to convex problems.
3 Representing Image Optimization Problems
We model an image optimization problem as a sum of penalties
f
i
on linear transforms K
i
x with x ∈ R
n
being the unknown:
argmin
x
I
X
i=1
f
i
(K
i
x) with K =
K
1
.
.
.
K
I
, (2)
where here
K ∈ R
m×n
is one large matrix that is composed of
stacked linear operators
K
1
, . . . , K
I
. The linear operator
K
i
∈
R
m
i
×n
selects a subset of
m
i
rows of
Kx
. This subset of rows is
then the input for the penalty functions f
i
: R
m
i
→ R.
Image optimization problems generally contain
• variables representing the image(s) to be reconstructed,
•
a forward model of image formation in terms of linear operators,
•
a penalty based on the difference of the results of this forward
model from measured data,
• and priors and constraints on the the variables.
For example, consider a slightly more complex version of the decon-
volution problem
(1)
where the convolved image
Dx
is subsampled
by a known demosaicking pattern, which we represent with the lin-
ear operator
M
. We formulate our problem using a sum-of-squares
error metric, f(x) = kMDx − bk
2
2
, and the penalty function:
r(x) = µk∇xk
1
+ (1 − µ)k∇xk
2
2
+ I
[0,∞)
(x),
where µ ∈ [0, 1], ∇ is the gradient operator, and:
I
[0,∞)
(x) =
(
0, if x ≥ 0
∞, otherwise.
The penalty function encodes the priors that many gradients are
zero and the pixel values are nonnegative. Problem
(3)
shows the
full optimization problem and how we represent it in the form of
Problem (2).
x
opt
= argmin
x
kMDx − bk
2
2
+ r(x) (3)
r(x) = µk∇xk
1
+ (1 − µ)k∇xk
2
2
+ I
[0,∞)
(x) (4)
model:
f
1
(v) = kv − bk
2
2
, K
1
= MD
f
2
(v) = µkvk
1
, K
2
= ∇
f
3
(v) = (1 − µ)kvk
2
2
, K
3
= ∇
f
4
(v) = I
[0,∞)
(v), K
4
= I
(5)
Note that there are other ways to represent the problem in our stand-
ard form. For example, we could use:
f
1
(v) = kMv −bk
2
2
, K
1
= D.
A key insight is that the choice of representation can drastically
affect the performance of the solver algorithms. We take advantage
of this fact and provide strategies to find an optimal reformulation.
The only assumption we make about the penalty functions
f
1
, . . . , f
I
is that they provide a black box for evaluating the function’s proximal
operator. The proximal operator of a function f is defined as:
prox
τf
(v) = argmin
x
f(x) +
1
2τ
kx − v k
2
2
,
where
τ > 0
and
v ∈ R
m
i
[Parikh and Boyd 2013]. The proximal
operator optimizes over the function in isolation, but incorporates
a quadratic term that can be used to link the optimization with
a broader algorithm. Many algorithms can be carried out using
proximal operators that cannot be carried out using the traditional
approach of interacting with functions by computing their gradients
and Hessians [Parikh and Boyd 2013].
Similarly, the only assumption we make about each linear operator
K
i
is that it provides a black box for evaluating the forward operator
x → K
i
x
and the adjoint operator
z → K
T
i
z
. This is a useful
abstraction because many linear operators that arise in optimization
problems from image processing are fast transforms, i.e., they have
methods for evaluating the forward and adjoint operator that are
more efficient than standard multiplication by the operator represen-
ted as a dense or sparse matrix. Common fast transforms in image
processing include the discrete Fourier transform (DFT), convolu-
tion, and wavelet transforms; see [Diamond and Boyd 2016a] for
many more examples.
For simplicity, we assume that all linear operators are maps from
a multidimensional real space
R
n
1
×···×n
k
to another multidimen-
sional real space
R
m
1
×···×m
`
. Complex-valued linear operators
such as the DFT are represented as real valued operators using the
standard embedding of a complex vector in
C
n
1
×···×n
k
as a real
vector in R
2n
1
×···×n
k
.
We call algorithms that solve Problem
(2)
using only these black
boxes proximal, matrix-free solvers. All solver algorithms in Prox-
ImaL are proximal, matrix-free solvers. ProxImaL currently sup-
ports the Pock-Chambolle algorithm, ADMM, linearized ADMM,
and half-quadratic splitting. See the supplement for detailed deriva-
tions showing that all of these methods fit into our framework from
(2)
. These algorithms can solve Problem
(2)
when the functions
f
1
, . . . , f
I
are convex.
Much state-of-the-art image optimization makes use of nonconvex
penalty functions; however, in applications ranging from denoising
and deconvolution to burst reconstruction and registration. Patch-
based approaches and hard thresholding in particular have been very
successful for image reconstruction problems [Krishnan and Fergus
2009; Danielyan et al. 2012; Heide et al. 2014].
Surprisingly, the same proximal, matrix-free solvers that work for
convex problems yield good results for certain problems that in-
clude nonconvex penalty functions [Danielyan et al
.
2012; Heide
et al
.
2014; Hallac et al
.
2015]. There is often no guarantee that
the algorithms will converge (see conditions in [Ochs et al
.
2014]
for exceptions). Furthermore, there is no guarantee that they find
the optimal
x
, but empirically for many problems with nonconvex
penalties the algorithms do produce good results in a reasonable
number of iterations.