-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal choice of alpha in Hager-Zhang #173
Comments
I've forked H-Z with a fix: https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl . |
Either it's not thought of in the original implementation, or there's a bug here perhaps. I'll have to look at the code and maybe my other implementation in NLSolvers.jl. Maybe the paper as well. |
For reference, here's a plot of the values: I think HZ assumes the function is differentiable. If it isn't (obviously I can't tell from the plot), then another line search method might be better. But that exit line you identified looks like a "I'm confused, let's bail" and so it would make sense to take an |
The original function was differentiable, and taking the minimum is a solution that works for me. I can make a PR if it's an acceptable fix, though this issue may be a symptom of a bug in HZ. |
Interesting. Is it a case you can share? I can't promise to spend time on it, but I am intrigued. |
It was some variant of Rosenbrock similar to this one: https://github.com/JuliaManifolds/Manopt.jl/pull/339/files#diff-3f760f49cdd6fe09857899f193de7724904da2185f32c61378c068841df87cebR245-R267 , likely with a different |
It is a thing that I encounter sometimes in optimization on compact manifolds: gradient is way longer than injectivity radius and the initial alpha equal to 1 is a really poor guess. |
@mateuszbaran, if you can't easily share a reproducer, one option is to run with option |
Here is a (relatively) simple reproducer: using Manopt
using Manifolds
using LineSearches
using LinearAlgebra
norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)
function f_rosenbrock_manopt(::AbstractManifold, x)
result = 0.0
for i in 1:2:length(x)
result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
end
return result
end
function g_rosenbrock_manopt!(M::AbstractManifold, storage, x)
for i in 1:2:length(x)
storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
end
riemannian_gradient!(M, storage, x, storage)
return storage
end
function test_case_manopt()
N = 4000
mem_len = 2
M = Manifolds.Sphere(N - 1; parameter=:field)
ls_hz = LineSearches.HagerZhang()
x0 = zeros(N)
x0[1] = 1
manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)
return quasi_Newton(
M,
f_rosenbrock_manopt,
g_rosenbrock_manopt!,
x0;
stepsize=Manopt.LineSearchesStepsize(ls_hz),
evaluation=InplaceEvaluation(),
vector_transport_method=ProjectionTransport(),
return_state=true,
memory_size=mem_len,
stopping_criterion=manopt_sc,
)
end
test_case_manopt() It gives different numbers than the original example but the issue is essentially the same. The problem happens during the third iteration of minimization using Manopt. |
secant2 calls update, and update might switch to bisection in the U3 step. If we did bisect, the line-search progress step L2 testing whether the interval is shrinking fast enough is nonsensical (we might have shrunk multiple iterations of bisection, but that is not an indication that the secant model was "working"). Consequently, this reports back about whether bisection was engaged in `update`, and if so skip any kind of convergence assessment and do another iteration. Fixes JuliaNLSolvers#173.
Thanks both of you for taking the initiative here! |
I've encountered a weird issue with alpha value selection in Hager-Zhang. Let me illustrate. Here are the values computed during bracketing:
The search ends on this line
LineSearches.jl/src/hagerzhang.jl
Line 287 in ded667a
iA=1
,ia=1
,ib=4
andiB=9
. Clearly, out of these four,iA
is the worst choice (highest cost value) but nevertheless it's the value returned by the procedure. My function isn't particularly well-behaved but shouldn't the line search try a bit harder to pick the best value? I mean, it could just look atib
andiB
instead of just returningiA
in that line. What's worse here, the returned alpha is 0 which throws off the direction selection algorithm I'm using.The text was updated successfully, but these errors were encountered: