Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic step sizes for SVRG #207

Merged

Conversation

bagibence
Copy link
Collaborator

Attempt to automatically determine the batch- and step sizes for SVRG when fitting a GLM with Poisson observations and a softplus inverse link function.
Based on this paper.

@bagibence
Copy link
Collaborator Author

This needs to be more exhaustively tested to make sure it works on real datasets.
I have even encountered toy examples where my current implementation didn't work either.

@bagibence
Copy link
Collaborator Author

The regularization strength might have to be added to the L and L_max constants determined.
It could also be useful as bound for the convexity when using ridge.

@BalzaniEdoardo BalzaniEdoardo marked this pull request as ready for review August 26, 2024 12:19
@BalzaniEdoardo
Copy link
Collaborator

This PR provides the infrastructure for computing an optimal stepsize and batch_size for SVRG based on the GLM configurations.

The optimal hyperparameters depends on the loss function L-smoothness. This means that for each model configuration (observation noise, link function, regularization), one may need to to compute a different estimate of the smoothness parameters.

Here, I implemented a look-up table that should be easy to extend whenever new estimates becomes available (for example if we derive the L-smoothness for Gamma + softplus observations).

@BalzaniEdoardo BalzaniEdoardo marked this pull request as draft August 26, 2024 12:44
@BalzaniEdoardo BalzaniEdoardo marked this pull request as ready for review August 26, 2024 15:00
Copy link
Member

@billbrod billbrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It is hard for me to review this (not knowing the math) without an example. Do we want to add an example as part of this PR or do a separate one to add an example for "how to use SVRG"?
  • We really do need to get the math for this up somewhere.
  • same point as Initialization #252 where it looks like black was run on some test scripts for the first time
  • The new test glm (and test population glm) functions are hard to parse. they look like they're doing a lot -- would it make sense to break them up?


**Fitting Large Models**

For very large models, you may consider using the Stochastic Variance Reduced Gradient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to point to example in the docs here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but for a future PR in the documentation. I'll link to this comment in the docs project

src/nemos/glm.py Show resolved Hide resolved

**Fitting Large Models**

For very large models, you may consider using the Stochastic Variance Reduced Gradient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same point about example and doc link

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsequent pr

src/nemos/solvers/_compute_defaults.py Outdated Show resolved Hide resolved
src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved
src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved
src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved
tests/test_glm.py Outdated Show resolved Hide resolved
tests/test_svrg_defaults.py Show resolved Hide resolved
tests/test_svrg_defaults.py Show resolved Hide resolved
@BalzaniEdoardo
Copy link
Collaborator

This is ready for another round, I should have addressed everything. The I added the pdf as an asset, just for us to have it in a place that is easy to find. However, I would not add it to the documentation or any public site yet, since it is very much a work in progress. I want to close this PR quickly, and before the math is polished

Copy link
Collaborator

@BalzaniEdoardo BalzaniEdoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billbrod some of the things I resolved add pending comments that I did not submit. I added the SVRG example to one of the issue in the docs project. The rest should be addressed. If you feel like the SVRG description in the GLM class can still be improved, go ahead and change it, I think that's the upper-bound of my English writing skills :)

src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved
src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved

**Fitting Large Models**

For very large models, you may consider using the Stochastic Variance Reduced Gradient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but for a future PR in the documentation. I'll link to this comment in the docs project

src/nemos/glm.py Show resolved Hide resolved

**Fitting Large Models**

For very large models, you may consider using the Stochastic Variance Reduced Gradient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subsequent pr

src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved
src/nemos/solvers/_svrg_defaults.py Outdated Show resolved Hide resolved
tests/test_svrg_defaults.py Show resolved Hide resolved
tests/test_glm.py Outdated Show resolved Hide resolved
Comment on lines +304 to +305
"Please, consider using the power method by setting the `n_power_iters` parameter "
"(default behavior).",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Please, consider using the power method by setting the `n_power_iters` parameter "
"(default behavior).",
"Please, consider using the power method by setting the `n_power_iters` parameter ",

# Calculate the Hessian directly and find the largest eigenvalue
XDX = X.T.dot((0.17 * y.reshape(y.shape[0], 1) + 0.25) * X) / y.shape[0]
return jnp.sort(jnp.linalg.eigvalsh(XDX))[-1]
except RuntimeError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be described in the docstring, under Raises (both here and in the user-facing function)

"n_power_iter, expectation",
[
(None, pytest.warns(UserWarning, match="Direct computation of the eigenvalues")),
(1, does_not_raise()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior with n_power_iter=0

Copy link
Member

@billbrod billbrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go once we add the RuntimeError for the eigenvalue to the docstrings.

I think the glm docstring is clear enough for now -- once we add the relevant doc, I think we can basically remove that info (moving it to the tutorial) from the docstring and point to the tutorial

@BalzaniEdoardo BalzaniEdoardo merged commit b85f408 into flatironinstitute:development Oct 28, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants