You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a really great implementation and improvement of MAML!
I'm curious about whether it's actually a good idea to let the network meta-learn a nonzero bias initialization for the linear classifier head. Since a support/target set could be fed to the model under any permutation, I would think that it doesn't make sense to favor one of the output classes over another. Any thoughts on this?
In a very similar vein, I'm wondering whether it would make sense to tie the weights of the linear classifier head. That is, for a 20-way classification problem, instead of having:
I did a simple experiment where I took a trained Omniglot-20way-1shot MAML model and added the following code to MAMLFewShotClassifier.load_model
...
withtorch.no_grad():
# zero biasstate_dict_loaded['classifier.layer_dict.linear.bias'].zero_()
# replace linear weights w/ average linear weightw=state_dict_loaded['classifier.layer_dict.linear.weights'].clone()
w=w.mean(dim=0, keepdims=True).expand_as(w)
state_dict_loaded['classifier.layer_dict.linear.weights'].set_(w)
...
This bumped accuracy from 0.943 to 0.953, which is neat. This is just anecdotal evidence for now though, since it's tested with only a single model.
I'm wondering whether you're able to send me all of your pretrained models somehow, so I can run this experiment across all of the various configurations that you report in the paper? It would be cool if this trick lead to improved performance across the board.
(Then, of course, the next step would be to train models end-to-end w/ these modifications and see if that gives a slight bump.)
Thanks!
~ Ben
The text was updated successfully, but these errors were encountered:
You bring up a very interesting point. Initializing all bias params to zero accelerates training of MAML++, because it's closer to a good solution. Your idea lies in a similar vein. Let me know if this improves any of the Mini-ImageNet results.
Unfortunately, I no longer have the full pretrained models. But, I could rerun them if you want. That is, if you are lacking the compute. I could also provide some of the 'High-End MAML++' models as described in my Learning to Learn via Self-Critique paper.
Hi --
This is a really great implementation and improvement of MAML!
I'm curious about whether it's actually a good idea to let the network meta-learn a nonzero bias initialization for the linear classifier head. Since a support/target set could be fed to the model under any permutation, I would think that it doesn't make sense to favor one of the output classes over another. Any thoughts on this?
In a very similar vein, I'm wondering whether it would make sense to tie the weights of the linear classifier head. That is, for a 20-way classification problem, instead of having:
you'd have something like
I did a simple experiment where I took a trained Omniglot-20way-1shot MAML model and added the following code to
MAMLFewShotClassifier.load_model
This bumped accuracy from
0.943
to0.953
, which is neat. This is just anecdotal evidence for now though, since it's tested with only a single model.I'm wondering whether you're able to send me all of your pretrained models somehow, so I can run this experiment across all of the various configurations that you report in the paper? It would be cool if this trick lead to improved performance across the board.
(Then, of course, the next step would be to train models end-to-end w/ these modifications and see if that gives a slight bump.)
Thanks!
~ Ben
The text was updated successfully, but these errors were encountered: