Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core principles #53

Open
jmarshrossney opened this issue Dec 8, 2020 · 6 comments
Open

Core principles #53

jmarshrossney opened this issue Dec 8, 2020 · 6 comments

Comments

@jmarshrossney
Copy link
Collaborator

jmarshrossney commented Dec 8, 2020

There's a lot of work that needs doing to move this project forward. Feeling somewhat paralysed by considering, in isolation, what we could do, I tried to write down what I thought were the core principles of this project -- i.e. the things we really care about and want to work as well as possible. Sometimes a compass is more useful than a map!

I think it might be useful to have this in writing and check that we're in agreement, for the most part.


  1. [Core] code should be as accessible as possible. Specifically, it should be easy to 'borrow' from, modify, and extend. Hence,
  • (a) Preferred tools are open source, maintained and, ideally, easy to use.
  • (b) One should be able to easily test new ideas by throwing things together in scripts. E.g. scripts that look like
prior_dist = Gaussian(mu=0, sigma=1)

# I've thrown a load of different layers together in a peculiar way that's not currently implemented in anvil
flow = MyFunkyFlow(*my_funky_params) 

for epoch in range(10000):
  my_funky_training_loop(prior_dist, flow)  # I've done some totally crazy stuff in this training loop

# Now I'd like to check everything is ok but I don't want to have to necessarily use pre-written functionality
latents, log_density = prior_dist.sample(N=1000)
outputs, log_density = flow.forward(latents, log_density)
my_funky_diagnostics(outputs)
my_funky_plot(latents, outputs)
  • (c) It would be nice if this project was one which didn't force the user to use the code exactly as the developers intended, and didn't punish them for trying to do things differently.
  1. Important results should be reproducible, with a high degree of confidence. We don't want to be questioning whether or not some data was obtained with X parameter switched on or off.
  • (a) Normal operation should involve the use of YAML runcards which are generated alongside training/sampling outputs, and are not touched henceforth. BUT we don't want to preclude the use of scripts as described above, or we will slow down the trialling of new ideas.
  • (b) For this to work we need to be able to write very detailed runcards, which specify exactly how the layers work. This is very easy in the phi four case but might be more challenging down the line. However, we also don't want to make writing runcards tedious (I think you can reuse bits of YAML multiple times, which would make specifying flows with lots of layers much easier!)
  1. It needs to be flexible enough to extend to higher dimensional fields, and to introduce layers which partition the fields in more complicated ways.
  • (a) We need to replace the hard-coded checkerboard partitioning with something more flexible, that can be specified on a per-layer basis (for example, you might want to have coupling layers split along red/black lattice sites then immediately afterwards coupling layers split along different field components).
  • (b) We would ideally like the coupling layers to work in a hierarchical way, so that any one-dimensional transformation can be used within a block of transformations that transform every component. PyTorch makes this reasonably straightforward.
@wilsonmr
Copy link
Owner

I agree with everything. It seems like a lot but I don't think it will be too bad. I think with the partioning we should probably have a library of partitions, checkerboard will be one and they are somehow specified as part of the layer.

btw this reminds me that there is an API which would mean you could call anvil like anvil.<action_name>(**<runcard_inputs>) and you can call any action in the dependency list. You can even call production rules etc.. It's useful for when the pipeline for getting to a point involves calling loads of functions sequentially but only has a few runcard dependencies

@jmarshrossney
Copy link
Collaborator Author

jmarshrossney commented Dec 10, 2020

The API sounds very useful.

I think with the partioning we should probably have a library of partitions, checkerboard will be one and they are somehow specified as part of the layer.

Yeah exactly. The easiest thing would be to allow each layer to posses either a generalisation of split_func and join_func or a masking tensor, so each layer does a split-transform-join... which to be fair is exactly what you coded up originally! The downside of this is that it's inefficient if we use the same partitioning scheme for successive layers, which is why ended up wrapping the entire flow in a single split and join, but then we lose flexibility.

What I'm thinking about is actually dedicated layers for partitioning the fields, that are nn.Modules so can form part of the chain. Then the layers just receive the active and passive partitions as separate inputs.

In that case we would want to be able to sandwich multiple layers between split-join operations. I guess the ideal way to write this in a runcard would be using syntax like

global_transformation_layer;
partitioning_scheme_1 {
    coupling_block_1;
    partitioning_scheme_2 {
        coupling_block_2;
        coupling_block_2;
    }
}
inverse_global_transformation_layer;

But I guess with yaml we would end up with the less attractive

- global_transformation_layer
- partitioning_scheme: 1
  layers:
    - coupling_block_1
    - partitioning_scheme: 2
      layers:
        - coupling_block_2
        - coupling_block_2
- inverse_global_transformation_layer

where layers (e.g. coupling_block_1) would be dictionaries, possibly defined separately in the yaml file.

I'm just trying to thought-experiment how this would work. I failed to get something similar to work in #46 .

If you wanted to keep the flow 'flat' you could do

- global_transformation_layer
- partitioning_scheme_1
- coupling_block_1
- partitioning_scheme_2
- coupling_block_2
- coupling_block_2
- undo_partitioning_scheme_2
- undo_partitioning_scheme_1
- inverse_global_transformation_layer

but this is risky I think.

@wilsonmr
Copy link
Owner

Why do you say it's risky?

@jmarshrossney
Copy link
Collaborator Author

jmarshrossney commented Dec 10, 2020

Hmm partly because it relies on the user to do things in the right order.

Also in the nested case I think we could have one nn.Module that does both the splitting and the joining, that works like

class partitioning_scheme(nn.Module):
    def __init__(self, nested_layers, *args_for_partitioning):
         ...
    def forward(self, input_tensor, unused_partition):
        z_a, z_b = self.split(input_tensor)
        phi_a, phi_b  = self.nested_layers(z_a, z_b)
        output_tensor = self.join(phi_a, phi_b)
        return output_tensor, unused_partition

instead of having two different nn.Modules which don't share args_for_partitioning, whatever they may be - lattice size for example.

@wilsonmr
Copy link
Owner

remember we can add checks to enforce certain rules. Although I do think nested might just work better here.

@wilsonmr
Copy link
Owner

I think we're closer to this. Would be nice to expand this: https://wilsonmr.github.io/anvil/get-started/basic-usage.html

to include some of what you say, and perhaps help a bit more with the core objects part. Also the reportengine actions are still a bit obfuscated I think I should add a bit more explanation of how this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants