Suboptimal choice of alpha in Hager-Zhang #173

mateuszbaran · 2023-12-30T20:03:51Z

I've encountered a weird issue with alpha value selection in Hager-Zhang. Let me illustrate. Here are the values computed during bracketing:

alphas = [0.0, 1.0, 0.5, 0.25, 0.11797881082031754, 0.05898940541015877, 0.029494702705079385, 0.014747351352539692, 0.007373675676269846]
values = [32867.0, 33089.907756650115, 33027.619212962294, 32749.607153359673, 33107.9648525175, 33056.34280854859, 32990.02494986772, 32936.18207166336, 32831.66557482673]
slopes = [-171068.0, -29730.999177119797, -74242.25565689849, 191429.29678267744, -7900.963612183026, -51707.79133751766, -103642.35349017732, -139136.2036028941, 182648.13853509305]

The search ends on this line

LineSearches.jl/src/hagerzhang.jl

Line 287 in ded667a

return A, values[iA]

, with iA=1, ia=1, ib=4 and iB=9. Clearly, out of these four, iA is the worst choice (highest cost value) but nevertheless it's the value returned by the procedure. My function isn't particularly well-behaved but shouldn't the line search try a bit harder to pick the best value? I mean, it could just look at ib and iB instead of just returning iA in that line. What's worse here, the returned alpha is 0 which throws off the direction selection algorithm I'm using.

The text was updated successfully, but these errors were encountered:

mateuszbaran · 2024-01-16T10:52:30Z

I've forked H-Z with a fix: https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl .

pkofod · 2024-01-16T21:36:44Z

Either it's not thought of in the original implementation, or there's a bug here perhaps. I'll have to look at the code and maybe my other implementation in NLSolvers.jl. Maybe the paper as well.

timholy · 2024-01-22T10:29:03Z

For reference, here's a plot of the values:

I think HZ assumes the function is differentiable. If it isn't (obviously I can't tell from the plot), then another line search method might be better.

But that exit line you identified looks like a "I'm confused, let's bail" and so it would make sense to take an argmin(values) and return that alpha.

mateuszbaran · 2024-01-22T12:38:43Z

The original function was differentiable, and taking the minimum is a solution that works for me. I can make a PR if it's an acceptable fix, though this issue may be a symptom of a bug in HZ.

timholy · 2024-01-22T20:32:14Z

Interesting. Is it a case you can share? I can't promise to spend time on it, but I am intrigued.

mateuszbaran · 2024-01-22T21:13:38Z

It was some variant of Rosenbrock similar to this one: https://github.com/JuliaManifolds/Manopt.jl/pull/339/files#diff-3f760f49cdd6fe09857899f193de7724904da2185f32c61378c068841df87cebR245-R267 , likely with a different N . Anyway, I just realized that I did the initial alpha guess and alpha_max suboptimally (I forgot to divide max_stepsize by the norm of descent direction).

mateuszbaran · 2024-01-22T21:16:47Z

It is a thing that I encounter sometimes in optimization on compact manifolds: gradient is way longer than injectivity radius and the initial alpha equal to 1 is a really poor guess.

timholy · 2024-01-23T13:38:25Z

@mateuszbaran, if you can't easily share a reproducer, one option is to run with option display = typemax(UInt64) and then share the output. That might help us understand better how this bad outcome happens.

mateuszbaran · 2024-01-23T18:25:18Z

Here is a (relatively) simple reproducer:

using Manopt
using Manifolds
using LineSearches
using LinearAlgebra

norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)

function f_rosenbrock_manopt(::AbstractManifold, x)
    result = 0.0
    for i in 1:2:length(x)
        result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
    end
    return result
end

function g_rosenbrock_manopt!(M::AbstractManifold, storage, x)
    for i in 1:2:length(x)
        storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
        storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
    end
    riemannian_gradient!(M, storage, x, storage)
    return storage
end


function test_case_manopt()
    N = 4000
    mem_len = 2
    M = Manifolds.Sphere(N - 1; parameter=:field)
    ls_hz = LineSearches.HagerZhang()

    x0 = zeros(N)
    x0[1] = 1
    manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)

    return quasi_Newton(
        M,
        f_rosenbrock_manopt,
        g_rosenbrock_manopt!,
        x0;
        stepsize=Manopt.LineSearchesStepsize(ls_hz),
        evaluation=InplaceEvaluation(),
        vector_transport_method=ProjectionTransport(),
        return_state=true,
        memory_size=mem_len,
        stopping_criterion=manopt_sc,
    )
end
test_case_manopt()

It gives different numbers than the original example but the issue is essentially the same. The problem happens during the third iteration of minimization using Manopt.

secant2 calls update, and update might switch to bisection in the U3 step. If we did bisect, the line-search progress step L2 testing whether the interval is shrinking fast enough is nonsensical (we might have shrunk multiple iterations of bisection, but that is not an indication that the secant model was "working"). Consequently, this reports back about whether bisection was engaged in `update`, and if so skip any kind of convergence assessment and do another iteration. Fixes JuliaNLSolvers#173.

pkofod · 2024-01-29T08:59:42Z

Thanks both of you for taking the initiative here!

This may cause problems of its own, but for the time being it's better than the status quo. Fixes #173 Closes #174 Fixes #175

mateuszbaran mentioned this issue Dec 30, 2023

New benchmark JuliaManifolds/Manopt.jl#339

Closed

2 tasks

mateuszbaran mentioned this issue Jan 16, 2024

New package: ImprovedHagerZhangLinesearch v0.1.0 JuliaRegistries/General#98950

Closed

timholy linked a pull request Jan 28, 2024 that will close this issue

HZ: avoid convergence if bisected #174

Open

timholy added a commit that referenced this issue Oct 22, 2024

Disable the flatness check in HZ

045e126

This may cause problems of its own, but for the time being it's better than the status quo. Fixes #173 Closes #174 Fixes #175

timholy linked a pull request Oct 22, 2024 that will close this issue

Add test cases, disable flatness check in HZ #180

Draft

timholy added a commit that referenced this issue Oct 22, 2024

Disable the flatness check in HZ

c043db6

This may cause problems of its own, but for the time being it's better than the status quo. Fixes #173 Closes #174 Fixes #175

timholy mentioned this issue Oct 24, 2024

Issues affecting changes in line search methods JuliaNLSolvers/Optim.jl#1104

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal choice of alpha in Hager-Zhang #173

Suboptimal choice of alpha in Hager-Zhang #173

mateuszbaran commented Dec 30, 2023

mateuszbaran commented Jan 16, 2024

pkofod commented Jan 16, 2024

timholy commented Jan 22, 2024

mateuszbaran commented Jan 22, 2024

timholy commented Jan 22, 2024

mateuszbaran commented Jan 22, 2024

mateuszbaran commented Jan 22, 2024

timholy commented Jan 23, 2024

mateuszbaran commented Jan 23, 2024

pkofod commented Jan 29, 2024

Suboptimal choice of alpha in Hager-Zhang #173

Suboptimal choice of alpha in Hager-Zhang #173

Comments

mateuszbaran commented Dec 30, 2023

mateuszbaran commented Jan 16, 2024

pkofod commented Jan 16, 2024

timholy commented Jan 22, 2024

mateuszbaran commented Jan 22, 2024

timholy commented Jan 22, 2024

mateuszbaran commented Jan 22, 2024

mateuszbaran commented Jan 22, 2024

timholy commented Jan 23, 2024

mateuszbaran commented Jan 23, 2024

pkofod commented Jan 29, 2024