Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different disparity levels? #3

Open
aellaboudy opened this issue Oct 17, 2017 · 5 comments
Open

Different disparity levels? #3

aellaboudy opened this issue Oct 17, 2017 · 5 comments

Comments

@aellaboudy
Copy link

In this work, you've hardcorded the expected disparity levels to +- 16. Is there anything stopping us from increasing those to a larger number, like +-128? There are other datasets that I'd like to test on that have these levels of disparity. What would need to change in the code to enable larger disparity searches?

@LouisFoucard
Copy link
Owner

So the Selection layer takes as input a list of disparity level at initialization. When constructing the network, I hardcoded +-16 in the build function, but that can be changed to any other list of disparity levels. The other thing you will have to be careful about, is to make sure that the number of channels in the last convolution before the selection layer corresponds to the total number of disparity levels you want to have. For example, if you want a total of 256 disparity levels, the input to the selection layer must looks like (batchsize, H, W, 256). I will make the disparity levels an input to the network building function in a future push.

@aellaboudy
Copy link
Author

Yes, I realized that changing the hardcoded +-16 value was not enough. I was able to get the model to compile after changing the dimensions on your model. You can see my work at https://github.com/aellaboudy/w-net.

However, the memory requirements quickly grow to become too large and I run out of memory for training. Even when memory is not a problem, it seems like the loss becomes unstable and I get "NaN" for the loss function after a few epochs.

@aellaboudy
Copy link
Author

It seems like with larger disparity values you must reduce the learning rate. I'm using a learning rate of 1e-7 now and it seems like that produces stable loss metrics. I also changed the disparity ranges to [0,64] for the left disparity and [-64,0] for the right disparity. I'm making the assumption that the images are rectified and thus, can only take on negative or positive values for the right or left disparities, respectively. This should be true for the dataset I'm using, the KITTI dataset.

I also added one more layer to you model and also added a consistency check to the loss to make sure that the left and right disparity maps are consistent.

The results I'm getting so far are not that great for the KITTI dataset, the disparity output seems to be very quantized and I see "squares" in the disparity output that are equal in size to the max disparity. Trying to debug to see what is going wrong. Any help appreciated.

@aellaboudy
Copy link
Author

After 100 epochs on the KITTI dataset, results are not stellar even after stretching the disparity levels to [0,120] and [-120,0] for the left and right images, respectively. Sample image below.

image

It would be great to understand what is stopping it from working well with larger disparities...one theory I have is that the KITTI dataset is much more challenging because of illumination changes (shadows, etc.) which make for very non-uniform image gradients on smooth depth surfaces. See the image. This paper has some pretty interesting ideas on how to get around illumination changes with slightly more complex loss functions and and more complicated architecture.

@Ockhius
Copy link

Ockhius commented Dec 6, 2018

It seems to me that the main problem keeping network back for predicting larger disparities is the receptive field of the network. I made a rough calculation and it looks like for a pixel at the output receptive field is around 160 pixels([-80,+80]). if we say it looks for correlation in the vicinity of pixel to calculate shift/disparity, it cant get enough info for those disparities you want it to work. Actually even precision should decrease towards higher disparity values. Deeper arch might increase valid interval of disparity, though it will be harder to train.

  • the paper you mentioned solves it by forming cost volume before pushing it for disparity calculation. It is similar to shifted_image stacks in this work. Receptive field is no more limiting, since input pixels are already aligned wrt possible disparity values in the input. For example if 120 is a valid disparity, in input volume at 120th voxel on disparity axis, you will have features corresponds to locations (x,y) and (x+120,y) for left and right images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants