Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question with Model Reproduction:performance gap on test set #156

Open
XZYuann opened this issue Dec 13, 2024 · 0 comments
Open

Question with Model Reproduction:performance gap on test set #156

XZYuann opened this issue Dec 13, 2024 · 0 comments

Comments

@XZYuann
Copy link

XZYuann commented Dec 13, 2024

Hi! Thank you for your excellent work on this project! However, we have noticed a discrepancy in performance between our implementation and the results presented in the paper.

model: we run "python main.py --train --base configs/stableSRNew/v2-finetune_text_T_512.yaml --gpus GPU_ID, --name NAME --scale_lr False" We use the script for training, and get the corresponding finetuned weight file.
then we use script "python scripts/sr_val_ddpm_text_T_vqganfin_old.py --config configs/stableSRNew/v2-finetune_text_T_512.yaml --ckpt CKPT_PATH --vqgan_ckpt VQGANCKPT_PATH --init-img INPUT_PATH --outdir OUT_DIR --ddpm_steps 200 --dec_w 0.5 --colorfix_type adain" with pretrained model(vqgan_cfw_00011.ckpt) and our finetuned weight file.
**Training Dataset:**DIV2K , DIV8K , Flickr2K and OutdoorSceneTraining datasets for about 300 epochs while trainning Train Time-aware encoder with SFT.vqgan_cfw_00011.ckpt
**Test Dataset:**Using DIV2K_valid_HR
Evaluate: we use IQA-PyTorch to evaluete all metrics
Our results: psnr 21.81; SSIM: 0.5297; LPIPS: 0.3481; FID: 27.76; CLIP-IQA: 0.6014; MUSIQ: 63.75
Results in the paper: psnr 23.26; SSIM: 0.5726; LPIPS: 0.3114; FID: 24.44; CLIP-IQA: 0.6771; MUSIQ: 65.92

Due to the huge gap in results, could you please tell us are there any subtle differences in the training process that could affect the results? Could there be any differences in how the datasets were preprocessed or augmented during training?Could there be a discrepancy in the random seed or initialization that might affect reproducibility?Are there any specific configurations or settings that might have led to the reported higher performance in the paper?
I would appreciate any guidance on how to resolve this issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant