Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement validation loss using FID scores and add corresponding documentation #326

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

IndigoDosSantos
Copy link

Description

This pull request introduces the implementation of validation loss using FID (Fréchet Inception Distance) scores and provides comprehensive documentation for this feature (for the wiki). The implementation includes:

  • Calculation of FID scores at regular intervals (currently after each epoch) using a separate validation image set
  • Storage of FID scores and generated images in the "epochs" folder within the workspace directory
  • Logging of FID scores to TensorBoard for visualization and monitoring

The accompanying documentation covers the following aspects:

  • Explanation of validation loss and its implementation using FID scores
  • Description of how validation loss complements other training metrics
  • Guidance on interpreting validation loss and its benefits for monitoring model performance
  • Details on the relationship between FID scores, model generalization, and overfitting
  • Recommendations for the size of the validation set, with 15% of the total dataset being a good middle ground
  • Implementation considerations for effectively utilizing validation loss, including:
  • Creating a separate validation image set
  • Configuring the "validation_images" concept in concepts.json
  • Storing FID scores and generated images in the "epochs" folder
  • Calculating FID scores after each epoch and logging them to TensorBoard

The previous logic incorrectly triggered sampling *before* each epoch, including before the start of training (of first epoch). This fix ensures sampling functions are processed only after a full epoch of training is completed.
- Create a hidden "epochs" folder to store epoch-specific sample subfolders
- Set the "Hidden" attribute for the "epochs" folder using ctypes.windll.kernel32.SetFileAttributesW
This commit adds the calculation of FID scores during the training process. It leverages the `calculate_fid_scores` script and integrates it into the `__sample_during_training` method. The FID scores are computed using the validation images specified in the `concepts.json` file and the generated samples saved in the hidden "epochs" directory.
@Nerogar
Copy link
Owner

Nerogar commented Jun 8, 2024

I like the idea of adding validation loss. But there are several issues with your implementation that would need to be resolved before it can be merged.

Just naming a few here, but there are definitely more problems.

  1. There is no configuration and everything is hidden from the user. This will lead to a lot of confusion, and even break possible future models that aren't generative image models.
  2. You hard code multiple paths and file name patterns.
    1. The workspace directory is configurable by the user, but you hard code it as epochs_dir = "workspace/run/epochs".
    2. you assume that the concepts are always stored in os.path.join(os.path.dirname(__file__), "..", "..", "training_concepts"), but that file can be stored anywhere or even inside the TrainConfig object.
    3. The sample file names might change in the future, but you assume that they always use a fixed naming scheme.
    4. You assume that there is a special concept for validation images instead of adding a configuration for it.
    5. ctypes.windll.kernel32.SetFileAttributesW(os.path.join(self.config.workspace_dir, "epochs"), 0x02): only works on windows. And why would you do this anyway?
  3. calculate_fid_scores.py is saved in scripts, but it's not executable. Scripts should all have the same structure, you can compare with any of the existing ones.
  4. What happens if the user doesn't sample regularly? Then the validation score can't be calculated.
  5. sys.path.append(scripts_dir): why? just use the import statement.

@FurkanGozukara

This comment was marked as off-topic.

@IndigoDosSantos
Copy link
Author

@Nerogar

Thx for reviewing the PR and providing feedback. I'll carefully consider your feedback and work on addressing the necessary changes as time permits.
NinjasInjectiveGIF

@IndigoDosSantos IndigoDosSantos marked this pull request as draft June 9, 2024 12:47
@IndigoDosSantos
Copy link
Author

Addressing point 1:

  • Add configuration options as requested, possibly for(?):
    • Validation image set
  • Ensure all relevant information is visible to the user, not hidden
  • Regarding compatibility with non-generative image models:
    • FID is specifically designed for comparing image distributions and is not suitable for non-generative image models
    • Other metrics would need to be used for evaluating non-generative models
    • The intention of this implementation was not to introduce a validation loss metric for models beyond generative image models
    • While similar concepts can be applied to other domains like text or audio, FID is specifically used for comparing image distributions

The current pull request places the validation image configuration within the concepts tab, which is not ideal. To improve the user experience and address organizational issues, I propose the following changes:

  1. Create a dedicated tab for training data, including both concepts and validation images1
  2. Rename the "data" tab to "data processing" to clarify its purpose
  3. Reorganize the UI to group related settings and remove any confusion

Example for new structure:
General
Model
Data Processing
├── Aspect Ratio Bucketing
├── Latent Caching
└── Clear Cache Before Training
Training Data
├── Concepts
└── Validation Images
Training
Sampling
Backup
Tools
Additional Embeddings

@Nerogar : What do you think about this?

Footnotes

  1. Concepts and validation images are both subsets of the training data. Concepts are curated sets of images used to guide the model's adaptation towards specific subjects or styles, while validation images are randomly selected from the training data to evaluate the model's performance on unseen data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants