Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation Mismatch with Paper Result #23

Open
YouyuChen0207 opened this issue Jan 30, 2024 · 2 comments
Open

Evaluation Mismatch with Paper Result #23

YouyuChen0207 opened this issue Jan 30, 2024 · 2 comments

Comments

@YouyuChen0207
Copy link

Hello, thanks for your great work! We are very intereseted in exploring further basing on your work.

However, there is a little issue. When trying to reproduce the evaluation for cross-scene generalization result with model '720000.pth' you released, we found the result is significant lower than the result you reported in your paper. We show the result below.

          Name         |  PSNR |  SSIM | LPIPS |
       llff-paper      | 25.86 | 0.867 | 0.116 |
   llff-reproduce      | 25.53 | 0.855 | 0.130 |
    blender-paper      | 27.29 | 0.937 | 0.056 |
blender-reproduce      | 26.02 | 0.926 | 0.073 |

Do you have any idea? Since it's important for us to reproduce your paper accurately.
All evaluation settings strictly follow README, and the evaluation is performed on a single nvidia 3090 GPU.

By the way, could you tell us your training hardware?

Appreciate for your attention!!!

@MukundVarmaT
Copy link
Collaborator

Thank you for your interest in our work!

We have observed that although GNT renders quite reasonably well (in most cases), places that have a plain background seem to be a shade darker than the ground truth (an inherent drawback of using attention). For example the white background in the case of NeRF synthetic. To verify, please try identifying the background (either using the ground truth mask or using any other segmentation method) and force-setting it to white, and then recomputing the above metrics. That should reproduce the results in the case of NeRF Synthetic (or blender).

In the case of NeRF LLFF, the minor differences could be because of random sampling (since we only use a coarse model). Again if u notice around the edges of the image, the epipolar projections are noisier (as one goes farther away from the camera) and therefore there are some artifacts. A trick to further improve results (not used to compute the metrics on the paper) is to crop the rendered image and then measure PSNR in the cropped region only.

I hope this information helps

@MukundVarmaT
Copy link
Collaborator

MukundVarmaT commented Jan 31, 2024

If i remember correctly, we use 8x48G cards for training, we fit 512 rays per GPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants