Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit model specification #68

Merged
merged 20 commits into from
May 20, 2021
Merged

Explicit model specification #68

merged 20 commits into from
May 20, 2021

Conversation

wilsonmr
Copy link
Owner

It looks slightly circular but I think I was careful to catch the infinite recursive loops.

The idea is that sequential_model allows the user to specify exactly which layers they want in any order. See the new runcard I added for reference. I believe that this means you can train all the models we talk about in the paper (and any others you can think of). It also means you can specify models where not all layers have same "shared" parameters

I also was careful to retain backwards compatibility, but we should check this more thoroughly.

@wilsonmr wilsonmr requested a review from jmarshrossney April 28, 2021 10:28
Copy link
Collaborator

@jmarshrossney jmarshrossney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great.

I do think we may as well standardise the inputs for the models, i.e. n_blocks instead of n_affine, n_spline, n_additive, and also replace z2_equivar_spline with z2_equivar.

This does mean that the models affine_spline and spline_affine would no longer be particularly useful, and I'm sure this is why you didn't already do this, but personally I don't mind that one bit since the new scheme is a massive improvement.

@wilsonmr
Copy link
Owner Author

I do think we may as well standardise the inputs for the models

Let's discuss this on slack

@wilsonmr
Copy link
Owner Author

I removed the preset models, slightly changed the naming conventions and then standardised the layer inputs, I think this makes the specification of flows flexible. It makes specifying the "standard" flow a bit more verbose but I think it's worth it.

I think it might be clearer if we made sequential go into layers and then we renamed core to feed_forward_network and have it as a single class module. We don't need to decide yet.

I changed the equivar bit of RQS because it looked like it was using an undefined variable, can you check I changed it to the correct variable?

Also I changed the comments because I thought we were trying to enforce C(-v) == C(v) not anti-symmetry, but maybe I'm missing something here?

anvil/models.py Outdated Show resolved Hide resolved
examples/runcards/train.yml Outdated Show resolved Hide resolved
anvil/models.py Outdated Show resolved Hide resolved
@wilsonmr
Copy link
Owner Author

wilsonmr commented Apr 30, 2021

  • add tests.

@wilsonmr
Copy link
Owner Author

This is ready for review. Btw should we add an epsilon to our batch norm? if the standard deviation of input is really small then this can be unstable

@jmarshrossney
Copy link
Collaborator

I changed the equivar bit of RQS because it looked like it was using an undefined variable, can you check I changed it to the correct variable?

Yep, good spot.

Also I changed the comments because I thought we were trying to enforce C(-v) == C(v) not anti-symmetry, but maybe I'm missing something here?

Ah the typo was in the affine layers, thanks for spotting. We do want the coupling layers to be odd, but the exp(-s) needs to be even, hence replacing s -> |s|.

Btw should we add an epsilon to our batch norm? if the standard deviation of input is really small then this can be unstable

Yeah you're right. I removed when I was getting rid of all the unused rubbish, since if we've got a tiny standard deviation there's probably something else that's gone wrong, but it's probably sensible to keep it.

@jmarshrossney
Copy link
Collaborator

I think we can now add batch normalisation and global rescaling layers as layers in their own right, rather than including them as part of the three existing models. I'm going to do this because batch normalisation and rescaling are definitely things we would like to be able to easily toggle on/off.

@jmarshrossney
Copy link
Collaborator

Ok I've done this but the problem is that we can't have independent instances of these layer actions, which means I can only include one global_rescaling layer in the model (which I could put in more than one place but this isn't useful). This also means we can't build models like e.g. affine->spline->affine since the affine sections will share parameters.

I guess this comes back to the old stumbling block, that reportengine is specifically designed to not re-calculate anything. Do you think there is a relatively painless workaround?

@wilsonmr
Copy link
Owner Author

No you can have as many layers as you want, the instances shouldn't be identical

@wilsonmr
Copy link
Owner Author

Like I can have (not a valid input but hopefully you get the point):

model:
 - layer: global rescaling
   <params>
 - layer: global rescaling
   <params>
 - etc.

@wilsonmr
Copy link
Owner Author

That last build was taking ages and looked like it was stuck at the testing stage. I'm just double checking the tests are still running in a reasonable time.

@jmarshrossney
Copy link
Collaborator

For future reference:

Ok I've done this but the problem is that we can't have independent instances of these layer actions, which means I can only include one global_rescaling layer in the model (which I could put in more than one place but this isn't useful). This also means we can't build models like e.g. affine->spline->affine since the affine sections will share parameters.

This problem was occurring when the model contained multiple layers which only used default parameters. If the user overrides one or more defaults, the layers are no longer duplicates and everything works as expected.

@wilsonmr
Copy link
Owner Author

is this done? Shall I double check your changes?

@jmarshrossney
Copy link
Collaborator

Just writing a test to catch the case when layers wrongly share parameters

@wilsonmr
Copy link
Owner Author

if you do like layer.parameters *= 2 or something like that and then check if both layers parameters are still equal does that fail for the current failing case and succeed when we remove all defaults?

Comment on lines 87 to 88
raise LayersNotIndependentError(
"Parameters are being shared amongst layers that should be independent."
Copy link
Owner Author

@wilsonmr wilsonmr May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a reason you couldn't except expect an assertion error?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants