In a nutshell, the primal dual algorithm finds the optimal z
?
(along with optimal dual vectors
ν
?
and λ
?
) by solving this system of nonlinear equations. The solution procedure is the classical
Newton method: at an interior point (z
k
, ν
k
, λ
k
) (by which we mean f
i
(z
k
) < 0, λ
k
> 0), the
system is linearized and solved. However, the step to new point (z
k+1
, ν
k+1
, λ
k+1
) must be
modified so that we remain in the interior.
In practice, we relax the complementary slackness condition λ
i
f
i
= 0 to
λ
k
i
f
i
(z
k
) = −1/τ
k
, (2)
where we judiciously increase the parameter τ
k
as we progress through the Newton iterations.
This biases the solution of the linearized equations towards the interior, allowing a smooth, well-
defined “central path” from an interior point to the solution on the boundary (see [15,20] for an
extended discussion).
The primal, dual, and central residuals quantify how close a point (z, ν, λ) is to satisfying (KKT )
with (2) in place of the slackness condition:
r
dual
= c
0
+ A
T
0
ν +
X
i
λ
i
c
i
r
cent
= −Λf − (1/τ )1
r
pri
= A
0
z − b,
where Λ is a diagonal matrix with (Λ)
ii
= λ
i
, and f =
f
1
(z) . . . f
m
(z)
T
.
From a point (z, ν, λ), we want to find a step (∆z , ∆ν, ∆λ) such that
r
τ
(z + ∆z, ν + ∆ν, λ + ∆λ) = 0. (3)
Linearizing (3) with the Taylor expansion around (z, ν, λ),
r
τ
(z + ∆z, ν + ∆ν, λ + ∆λ) ≈ r
τ
(z, ν, λ) + J
r
τ
(z, νλ)
∆z
∆ν
∆λ
,
where J
r
τ
(z, νλ) is the Jacobian of r
τ
, we have the system
0 A
T
0
C
T
−ΛC 0 −F
A
0
0 0
∆z
∆v
∆λ
= −
c
0
+ A
T
0
ν +
P
i
λ
i
c
i
−Λf − (1/τ)1
A
0
z − b
,
where m × N matrix C has the c
T
i
as rows, and F is diagonal with (F )
ii
= f
i
(z). We can
eliminate ∆λ using:
∆λ = −ΛF
−1
C∆z − λ − (1/τ)f
−1
(4)
leaving us with the core system
−C
T
F
−1
ΛC A
T
0
A
0
0
∆z
∆ν
=
−c
0
+ (1/τ)C
T
f
−1
− A
T
0
ν
b − A
0
z
. (5)
With the (∆z, ∆ν, ∆λ) we have a step direction. To choose the step length 0 < s ≤ 1, we ask
that it satisfy two criteria:
1. z + s∆z and λ + s∆λ are in the interior, i.e. f
i
(z + s∆z) < 0, λ
i
> 0 for all i.
2. The norm of the residuals has decreased sufficiently:
kr
τ
(z + s∆z, ν + s∆ν, λ + s∆λ)k
2
≤ (1 − αs) · kr
τ
(z, ν, λ)k
2
,
where α is a user-s precified parameter (in all of our implementations, we have set α = 0.01).
4