Evaluation Mismatch with Paper Result #23

YouyuChen0207 · 2024-01-30T06:04:52Z

Hello, thanks for your great work! We are very intereseted in exploring further basing on your work.

However, there is a little issue. When trying to reproduce the evaluation for cross-scene generalization result with model '720000.pth' you released, we found the result is significant lower than the result you reported in your paper. We show the result below.

          Name         |  PSNR |  SSIM | LPIPS |
       llff-paper      | 25.86 | 0.867 | 0.116 |
   llff-reproduce      | 25.53 | 0.855 | 0.130 |
    blender-paper      | 27.29 | 0.937 | 0.056 |
blender-reproduce      | 26.02 | 0.926 | 0.073 |

Do you have any idea? Since it's important for us to reproduce your paper accurately.
All evaluation settings strictly follow README, and the evaluation is performed on a single nvidia 3090 GPU.

By the way, could you tell us your training hardware?

Appreciate for your attention!!!

The text was updated successfully, but these errors were encountered:

MukundVarmaT · 2024-01-31T05:24:46Z

Thank you for your interest in our work!

We have observed that although GNT renders quite reasonably well (in most cases), places that have a plain background seem to be a shade darker than the ground truth (an inherent drawback of using attention). For example the white background in the case of NeRF synthetic. To verify, please try identifying the background (either using the ground truth mask or using any other segmentation method) and force-setting it to white, and then recomputing the above metrics. That should reproduce the results in the case of NeRF Synthetic (or blender).

In the case of NeRF LLFF, the minor differences could be because of random sampling (since we only use a coarse model). Again if u notice around the edges of the image, the epipolar projections are noisier (as one goes farther away from the camera) and therefore there are some artifacts. A trick to further improve results (not used to compute the metrics on the paper) is to crop the rendered image and then measure PSNR in the cropped region only.

I hope this information helps

MukundVarmaT · 2024-01-31T05:25:49Z

If i remember correctly, we use 8x48G cards for training, we fit 512 rays per GPU

MukundVarmaT mentioned this issue Jan 31, 2024

Question about result of generalization model #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Mismatch with Paper Result #23

Evaluation Mismatch with Paper Result #23

YouyuChen0207 commented Jan 30, 2024

MukundVarmaT commented Jan 31, 2024

MukundVarmaT commented Jan 31, 2024 •

edited

Loading

Evaluation Mismatch with Paper Result #23

Evaluation Mismatch with Paper Result #23

Comments

YouyuChen0207 commented Jan 30, 2024

MukundVarmaT commented Jan 31, 2024

MukundVarmaT commented Jan 31, 2024 • edited Loading

MukundVarmaT commented Jan 31, 2024 •

edited

Loading