Implement validation loss using FID scores and add corresponding documentation #326

IndigoDosSantos · 2024-06-07T17:15:11Z

Description

This pull request introduces the implementation of validation loss using FID (Fréchet Inception Distance) scores and provides comprehensive documentation for this feature (for the wiki). The implementation includes:

Calculation of FID scores at regular intervals (currently after each epoch) using a separate validation image set
Storage of FID scores and generated images in the "epochs" folder within the workspace directory
Logging of FID scores to TensorBoard for visualization and monitoring

The accompanying documentation covers the following aspects:

Explanation of validation loss and its implementation using FID scores
Description of how validation loss complements other training metrics
Guidance on interpreting validation loss and its benefits for monitoring model performance
Details on the relationship between FID scores, model generalization, and overfitting
Recommendations for the size of the validation set, with 15% of the total dataset being a good middle ground
Implementation considerations for effectively utilizing validation loss, including:
Creating a separate validation image set
Configuring the "validation_images" concept in concepts.json
Storing FID scores and generated images in the "epochs" folder
Calculating FID scores after each epoch and logging them to TensorBoard

The previous logic incorrectly triggered sampling *before* each epoch, including before the start of training (of first epoch). This fix ensures sampling functions are processed only after a full epoch of training is completed.

- Create a hidden "epochs" folder to store epoch-specific sample subfolders - Set the "Hidden" attribute for the "epochs" folder using ctypes.windll.kernel32.SetFileAttributesW

… a set of validation images.

This commit adds the calculation of FID scores during the training process. It leverages the `calculate_fid_scores` script and integrates it into the `__sample_during_training` method. The FID scores are computed using the validation images specified in the `concepts.json` file and the generated samples saved in the hidden "epochs" directory.

…cores

Nerogar · 2024-06-08T12:44:42Z

I like the idea of adding validation loss. But there are several issues with your implementation that would need to be resolved before it can be merged.

Just naming a few here, but there are definitely more problems.

There is no configuration and everything is hidden from the user. This will lead to a lot of confusion, and even break possible future models that aren't generative image models.
You hard code multiple paths and file name patterns.
1. The workspace directory is configurable by the user, but you hard code it as epochs_dir = "workspace/run/epochs".
2. you assume that the concepts are always stored in os.path.join(os.path.dirname(__file__), "..", "..", "training_concepts"), but that file can be stored anywhere or even inside the TrainConfig object.
3. The sample file names might change in the future, but you assume that they always use a fixed naming scheme.
4. You assume that there is a special concept for validation images instead of adding a configuration for it.
5. ctypes.windll.kernel32.SetFileAttributesW(os.path.join(self.config.workspace_dir, "epochs"), 0x02): only works on windows. And why would you do this anyway?
calculate_fid_scores.py is saved in scripts, but it's not executable. Scripts should all have the same structure, you can compare with any of the existing ones.
What happens if the user doesn't sample regularly? Then the validation score can't be calculated.
sys.path.append(scripts_dir): why? just use the import statement.

IndigoDosSantos · 2024-06-09T12:42:54Z

@Nerogar

Thx for reviewing the PR and providing feedback. I'll carefully consider your feedback and work on addressing the necessary changes as time permits.

IndigoDosSantos · 2024-06-09T14:54:49Z

Addressing point 1:

Add configuration options as requested, possibly for(?):
- Validation image set
Ensure all relevant information is visible to the user, not hidden
Regarding compatibility with non-generative image models:
- FID is specifically designed for comparing image distributions and is not suitable for non-generative image models
- Other metrics would need to be used for evaluating non-generative models
- The intention of this implementation was not to introduce a validation loss metric for models beyond generative image models
- While similar concepts can be applied to other domains like text or audio, FID is specifically used for comparing image distributions

The current pull request places the validation image configuration within the concepts tab, which is not ideal. To improve the user experience and address organizational issues, I propose the following changes:

Create a dedicated tab for training data, including both concepts and validation images¹
Rename the "data" tab to "data processing" to clarify its purpose
Reorganize the UI to group related settings and remove any confusion

Example for new structure:
General
Model
Data Processing
├── Aspect Ratio Bucketing
├── Latent Caching
└── Clear Cache Before Training
Training Data
├── Concepts
└── Validation Images
Training
Sampling
Backup
Tools
Additional Embeddings

@Nerogar : What do you think about this?

Concepts and validation images are both subsets of the training data. Concepts are curated sets of images used to guide the model's adaptation towards specific subjects or styles, while validation images are randomly selected from the training data to evaluate the model's performance on unseen data. ↩

IndigoDosSantos added 18 commits June 2, 2024 15:57

Save samples for each epoch in separate folders

c0d325a

Add hidden epochs folder for organizing samples

71ae93a

- Create a hidden "epochs" folder to store epoch-specific sample subfolders - Set the "Hidden" attribute for the "epochs" folder using ctypes.windll.kernel32.SetFileAttributesW

Add torch-fidelity for FID score calculation

57c409d

Calculate FID scores for the latest epoch of generated images against…

7b78225

… a set of validation images.

Add functionality to delete the hidden epochs folder after training

09f2ba8

Return the epoch_fid_scores dictionary

c7c22b3

Use scalar epoch numbers as keys in epoch_fid_scores dictionary

535723b

Extract validation_images_path and epochs_path to a separate method

73b3e6b

Add FID score tensorboard logging

d124c3a

Update GenericTrainer.py

163ef32

Delete epochs folder before starting training

0b92dea

Correct FID score logging and implement JSON file usage for storing s…

074681c

…cores

Load FID scores from fid_scores.json

13a4706

Create page1.md

6ad8513

Add wiki documentation for validation loss using FID scores

2772188

Update and rename page1.md to validation_loss.md

01c4161

This comment was marked as off-topic.

Sign in to view

IndigoDosSantos marked this pull request as draft June 9, 2024 12:47

Merge branch 'Nerogar:master' into TensorBoard

61899ce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement validation loss using FID scores and add corresponding documentation #326

Implement validation loss using FID scores and add corresponding documentation #326

IndigoDosSantos commented Jun 7, 2024

Nerogar commented Jun 8, 2024

This comment was marked as off-topic.

IndigoDosSantos commented Jun 9, 2024

IndigoDosSantos commented Jun 9, 2024

Implement validation loss using FID scores and add corresponding documentation #326

Are you sure you want to change the base?

Implement validation loss using FID scores and add corresponding documentation #326

Conversation

IndigoDosSantos commented Jun 7, 2024

Description

Nerogar commented Jun 8, 2024

This comment was marked as off-topic.

IndigoDosSantos commented Jun 9, 2024

IndigoDosSantos commented Jun 9, 2024

Footnotes