-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic step sizes for SVRG #207
Automatic step sizes for SVRG #207
Conversation
This needs to be more exhaustively tested to make sure it works on real datasets. |
The regularization strength might have to be added to the |
This PR provides the infrastructure for computing an optimal stepsize and batch_size for SVRG based on the GLM configurations. The optimal hyperparameters depends on the loss function L-smoothness. This means that for each model configuration (observation noise, link function, regularization), one may need to to compute a different estimate of the smoothness parameters. Here, I implemented a look-up table that should be easy to extend whenever new estimates becomes available (for example if we derive the L-smoothness for Gamma + softplus observations). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It is hard for me to review this (not knowing the math) without an example. Do we want to add an example as part of this PR or do a separate one to add an example for "how to use SVRG"?
- We really do need to get the math for this up somewhere.
- same point as Initialization #252 where it looks like black was run on some test scripts for the first time
- The new test glm (and test population glm) functions are hard to parse. they look like they're doing a lot -- would it make sense to break them up?
|
||
**Fitting Large Models** | ||
|
||
For very large models, you may consider using the Stochastic Variance Reduced Gradient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to point to example in the docs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but for a future PR in the documentation. I'll link to this comment in the docs project
|
||
**Fitting Large Models** | ||
|
||
For very large models, you may consider using the Stochastic Variance Reduced Gradient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same point about example and doc link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsequent pr
This is ready for another round, I should have addressed everything. The I added the pdf as an asset, just for us to have it in a place that is easy to find. However, I would not add it to the documentation or any public site yet, since it is very much a work in progress. I want to close this PR quickly, and before the math is polished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@billbrod some of the things I resolved add pending comments that I did not submit. I added the SVRG example to one of the issue in the docs project. The rest should be addressed. If you feel like the SVRG description in the GLM class can still be improved, go ahead and change it, I think that's the upper-bound of my English writing skills :)
|
||
**Fitting Large Models** | ||
|
||
For very large models, you may consider using the Stochastic Variance Reduced Gradient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but for a future PR in the documentation. I'll link to this comment in the docs project
|
||
**Fitting Large Models** | ||
|
||
For very large models, you may consider using the Stochastic Variance Reduced Gradient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subsequent pr
"Please, consider using the power method by setting the `n_power_iters` parameter " | ||
"(default behavior).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Please, consider using the power method by setting the `n_power_iters` parameter " | |
"(default behavior).", | |
"Please, consider using the power method by setting the `n_power_iters` parameter ", |
# Calculate the Hessian directly and find the largest eigenvalue | ||
XDX = X.T.dot((0.17 * y.reshape(y.shape[0], 1) + 0.25) * X) / y.shape[0] | ||
return jnp.sort(jnp.linalg.eigvalsh(XDX))[-1] | ||
except RuntimeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be described in the docstring, under Raises (both here and in the user-facing function)
"n_power_iter, expectation", | ||
[ | ||
(None, pytest.warns(UserWarning, match="Direct computation of the eigenvalues")), | ||
(1, does_not_raise()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behavior with n_power_iter=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good to go once we add the RuntimeError for the eigenvalue to the docstrings.
I think the glm docstring is clear enough for now -- once we add the relevant doc, I think we can basically remove that info (moving it to the tutorial) from the docstring and point to the tutorial
b85f408
into
flatironinstitute:development
Attempt to automatically determine the batch- and step sizes for SVRG when fitting a GLM with Poisson observations and a softplus inverse link function.
Based on this paper.