-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit model specification #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great.
I do think we may as well standardise the inputs for the models, i.e. n_blocks
instead of n_affine
, n_spline
, n_additive
, and also replace z2_equivar_spline
with z2_equivar
.
This does mean that the models affine_spline
and spline_affine
would no longer be particularly useful, and I'm sure this is why you didn't already do this, but personally I don't mind that one bit since the new scheme is a massive improvement.
Let's discuss this on slack |
1b487f0
to
21c0af3
Compare
I removed the preset models, slightly changed the naming conventions and then standardised the layer inputs, I think this makes the specification of flows flexible. It makes specifying the "standard" flow a bit more verbose but I think it's worth it. I think it might be clearer if we made sequential go into layers and then we renamed core to feed_forward_network and have it as a single class module. We don't need to decide yet. I changed the equivar bit of RQS because it looked like it was using an undefined variable, can you check I changed it to the correct variable? Also I changed the comments because I thought we were trying to enforce C(-v) == C(v) not anti-symmetry, but maybe I'm missing something here? |
…for much more flexibility
21c0af3
to
76cb8ea
Compare
|
This is ready for review. Btw should we add an epsilon to our batch norm? if the standard deviation of input is really small then this can be unstable |
Yep, good spot.
Ah the typo was in the affine layers, thanks for spotting. We do want the coupling layers to be odd, but the exp(-s) needs to be even, hence replacing s -> |s|.
Yeah you're right. I removed when I was getting rid of all the unused rubbish, since if we've got a tiny standard deviation there's probably something else that's gone wrong, but it's probably sensible to keep it. |
I think we can now add batch normalisation and global rescaling layers as layers in their own right, rather than including them as part of the three existing models. I'm going to do this because batch normalisation and rescaling are definitely things we would like to be able to easily toggle on/off. |
Ok I've done this but the problem is that we can't have independent instances of these layer actions, which means I can only include one I guess this comes back to the old stumbling block, that reportengine is specifically designed to not re-calculate anything. Do you think there is a relatively painless workaround? |
No you can have as many layers as you want, the instances shouldn't be identical |
Like I can have (not a valid input but hopefully you get the point):
|
That last build was taking ages and looked like it was stuck at the testing stage. I'm just double checking the tests are still running in a reasonable time. |
For future reference:
This problem was occurring when the model contained multiple layers which only used default parameters. If the user overrides one or more defaults, the layers are no longer duplicates and everything works as expected. |
is this done? Shall I double check your changes? |
Just writing a test to catch the case when layers wrongly share parameters |
if you do like |
anvil/tests/test_models.py
Outdated
raise LayersNotIndependentError( | ||
"Parameters are being shared amongst layers that should be independent." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there a reason you couldn't except expect an assertion error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
It looks slightly circular but I think I was careful to catch the infinite recursive loops.
The idea is that
sequential_model
allows the user to specify exactly which layers they want in any order. See the new runcard I added for reference. I believe that this means you can train all the models we talk about in the paper (and any others you can think of). It also means you can specify models where not all layers have same "shared" parametersI also was careful to retain backwards compatibility, but we should check this more thoroughly.