You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Research on Joint Energy Models (JEM) is still in its early stages (as far as I'm aware), but it draws on existing body of literature on Energy-Based Models (EBM). The former is concerned with modelling the joint objective, i.e. learning to predict outputs and generate inputs, while the former is only concerned with the generative task (I am brutally simplifying and generalising here, please correct me if this seems wrong).
When building this package I tried following best practices prescribed in the literature. The resources I have looked at for reference include (among other things):
I thought I'd use this discussion to keep track of some observations I've made while building this and playing around with different training techniques:
I noticed that in Tom the cross-entropy loss and the generative loss is computed for each instance in the batch and only aggregated in the end. Why is that? I started off using the same approach but it seemed to slow things down a bit and I didn't have the impression that it had any impact on performance.
I've noticed that it's possible to achieve decent performance for both tasks (JEM) even when generating only a single instance per batch and epochs. So instead of using SGLD to generating batch_size samples each time, it's possible to generate just a couple (e.g. 10). This reduces computational costs but is it also a sound approach?
When drawing from the buffer I may end up drawing a sample from the wrong output class. For example, suppose I want to generate an MNIST image conditional on $y=1$. My buffer is full of samples for each class $y=0,...,9$. If I draw a sample corresponding to $y=9$ for example, doesn't that make my job harder?
Surprisingly I have difficulties training JEMs for very simple, linearly separable synthetic data.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Research on Joint Energy Models (JEM) is still in its early stages (as far as I'm aware), but it draws on existing body of literature on Energy-Based Models (EBM). The former is concerned with modelling the joint objective, i.e. learning to predict outputs and generate inputs, while the former is only concerned with the generative task (I am brutally simplifying and generalising here, please correct me if this seems wrong).
When building this package I tried following best practices prescribed in the literature. The resources I have looked at for reference include (among other things):
I thought I'd use this discussion to keep track of some observations I've made while building this and playing around with different training techniques:
batch_size
samples each time, it's possible to generate just a couple (e.g. 10). This reduces computational costs but is it also a sound approach?Beta Was this translation helpful? Give feedback.
All reactions