Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal choice of alpha in Hager-Zhang #173

Open
mateuszbaran opened this issue Dec 30, 2023 · 10 comments · May be fixed by #174 or #180
Open

Suboptimal choice of alpha in Hager-Zhang #173

mateuszbaran opened this issue Dec 30, 2023 · 10 comments · May be fixed by #174 or #180

Comments

@mateuszbaran
Copy link
Contributor

I've encountered a weird issue with alpha value selection in Hager-Zhang. Let me illustrate. Here are the values computed during bracketing:

alphas = [0.0, 1.0, 0.5, 0.25, 0.11797881082031754, 0.05898940541015877, 0.029494702705079385, 0.014747351352539692, 0.007373675676269846]
values = [32867.0, 33089.907756650115, 33027.619212962294, 32749.607153359673, 33107.9648525175, 33056.34280854859, 32990.02494986772, 32936.18207166336, 32831.66557482673]
slopes = [-171068.0, -29730.999177119797, -74242.25565689849, 191429.29678267744, -7900.963612183026, -51707.79133751766, -103642.35349017732, -139136.2036028941, 182648.13853509305]

The search ends on this line

return A, values[iA]
, with iA=1, ia=1, ib=4 and iB=9. Clearly, out of these four, iA is the worst choice (highest cost value) but nevertheless it's the value returned by the procedure. My function isn't particularly well-behaved but shouldn't the line search try a bit harder to pick the best value? I mean, it could just look at ib and iB instead of just returning iA in that line. What's worse here, the returned alpha is 0 which throws off the direction selection algorithm I'm using.

@mateuszbaran
Copy link
Contributor Author

I've forked H-Z with a fix: https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl .

@pkofod
Copy link
Member

pkofod commented Jan 16, 2024

Either it's not thought of in the original implementation, or there's a bug here perhaps. I'll have to look at the code and maybe my other implementation in NLSolvers.jl. Maybe the paper as well.

@timholy
Copy link
Contributor

timholy commented Jan 22, 2024

For reference, here's a plot of the values:

ls

I think HZ assumes the function is differentiable. If it isn't (obviously I can't tell from the plot), then another line search method might be better.

But that exit line you identified looks like a "I'm confused, let's bail" and so it would make sense to take an argmin(values) and return that alpha.

@mateuszbaran
Copy link
Contributor Author

The original function was differentiable, and taking the minimum is a solution that works for me. I can make a PR if it's an acceptable fix, though this issue may be a symptom of a bug in HZ.

@timholy
Copy link
Contributor

timholy commented Jan 22, 2024

Interesting. Is it a case you can share? I can't promise to spend time on it, but I am intrigued.

@mateuszbaran
Copy link
Contributor Author

It was some variant of Rosenbrock similar to this one: https://github.com/JuliaManifolds/Manopt.jl/pull/339/files#diff-3f760f49cdd6fe09857899f193de7724904da2185f32c61378c068841df87cebR245-R267 , likely with a different N . Anyway, I just realized that I did the initial alpha guess and alpha_max suboptimally (I forgot to divide max_stepsize by the norm of descent direction).

@mateuszbaran
Copy link
Contributor Author

It is a thing that I encounter sometimes in optimization on compact manifolds: gradient is way longer than injectivity radius and the initial alpha equal to 1 is a really poor guess.

@timholy
Copy link
Contributor

timholy commented Jan 23, 2024

@mateuszbaran, if you can't easily share a reproducer, one option is to run with option display = typemax(UInt64) and then share the output. That might help us understand better how this bad outcome happens.

@mateuszbaran
Copy link
Contributor Author

Here is a (relatively) simple reproducer:

using Manopt
using Manifolds
using LineSearches
using LinearAlgebra

norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)

function f_rosenbrock_manopt(::AbstractManifold, x)
    result = 0.0
    for i in 1:2:length(x)
        result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
    end
    return result
end

function g_rosenbrock_manopt!(M::AbstractManifold, storage, x)
    for i in 1:2:length(x)
        storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
        storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
    end
    riemannian_gradient!(M, storage, x, storage)
    return storage
end


function test_case_manopt()
    N = 4000
    mem_len = 2
    M = Manifolds.Sphere(N - 1; parameter=:field)
    ls_hz = LineSearches.HagerZhang()

    x0 = zeros(N)
    x0[1] = 1
    manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)

    return quasi_Newton(
        M,
        f_rosenbrock_manopt,
        g_rosenbrock_manopt!,
        x0;
        stepsize=Manopt.LineSearchesStepsize(ls_hz),
        evaluation=InplaceEvaluation(),
        vector_transport_method=ProjectionTransport(),
        return_state=true,
        memory_size=mem_len,
        stopping_criterion=manopt_sc,
    )
end
test_case_manopt()

It gives different numbers than the original example but the issue is essentially the same. The problem happens during the third iteration of minimization using Manopt.

timholy added a commit to timholy/LineSearches.jl that referenced this issue Jan 28, 2024
secant2 calls update, and update might switch to bisection in the U3 step. If we did bisect, the line-search progress
step L2 testing whether the interval is shrinking fast
enough is nonsensical (we might have shrunk multiple
iterations of bisection, but that is not an indication that the
secant model was "working").

Consequently, this reports back about whether bisection
was engaged in `update`, and if so skip any kind of
convergence assessment and do another iteration.

Fixes JuliaNLSolvers#173.
@timholy timholy linked a pull request Jan 28, 2024 that will close this issue
@pkofod
Copy link
Member

pkofod commented Jan 29, 2024

Thanks both of you for taking the initiative here!

timholy added a commit that referenced this issue Oct 22, 2024
This may cause problems of its own, but for the time being it's better
than the status quo.

Fixes #173
Closes #174
Fixes #175
@timholy timholy linked a pull request Oct 22, 2024 that will close this issue
timholy added a commit that referenced this issue Oct 22, 2024
This may cause problems of its own, but for the time being it's better
than the status quo.

Fixes #173
Closes #174
Fixes #175
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants