Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Robust Multiple-Baseline Stereo matching for robust stereoscopy in repetitive-pattern environments (RMBS-stereo) #675

Open
stephansturges opened this issue Sep 15, 2022 · 21 comments

Comments

@stephansturges
Copy link

stephansturges commented Sep 15, 2022

Start with Why?

When using stereo devices "in the wild" on human-made objects it is extremely common to encounter repetitive patterns on the objects that you want to retrieve depth from. Brick walls, cobblestone roads, roof shingles, tiling etc... these often constitute the majority of a scene in an urban environment.

Unfortunately, stereo matching as implemented in DepthAI is not particularly good at retrieving depth from these types of patterns, for reasons that are well documented in the history of development of stereoscopy algorithms. See here for more information:
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.174.152&rep=rep1&type=pdf

This can be observed with the current Luxonis devices, such as in this scene where a CM4-PoE device is pointed down at cobblestones from a height of approx 10m. Notice the depth estimation is inconclusive in the area that shows the most repetition. In the current state of the DepthAI library this cannot be solved by tuning parameters in the stereo matching cost function.
image

(see more examples here: https://discuss.luxonis.com/d/875-depth-parameters-configuration-testing-code/4 )
How can this be solved?

There are different approaches to the solution, but the most promising seems to be using multiple-baseline cameras. See this paper for one implementation and details:
https://www.researchgate.net/publication/3916660_A_robust_stereo-matching_algorithm_using_multiple-baseline_cameras

This has already been mentioned also in the context of retrieving additional information from the planned Long-Range device by @ynjiun in this thread on the hardware: luxonis/depthai-hardware#247

In the context of DepthAI the implementation would require using the RGB camera on OAK-D devices in a desaturated mode as a third mono source, and use it to calculate additional disparity maps with one or both of the Mono sensors to refine the local variance value for each pixel in a sliding window.

It would be very useful for the type of work I am doing to see an evolution to this type of stereo matching to solve the problem of repeating textures, and I'm sure it would benefit many in the community!

@stephansturges stephansturges changed the title Request: robust multiple-baseline stereo matching for robust stereoscopy in repetitive-pattern environments [Request] Robust Multiple-Baseline Stereo matching for robust stereoscopy in repetitive-pattern environments (RMBS-stereo) Sep 15, 2022
@ynjiun
Copy link

ynjiun commented Sep 16, 2022

@stephansturges
could you share your captured left/right images of this "repeative texture" case?
I would like to try my algorithms to see if it can alleviate the "blue" area (no depth).
Thanks.

@stephansturges
Copy link
Author

@stephansturges could you share your captured left/right images of this "repeative texture" case? I would like to try my algorithms to see if it can alleviate the "blue" area (no depth). Thanks.

Thanks for offering to do the test!
You can find example mono files of this location here: https://drive.google.com/drive/folders/14JB64ApZJRZm_Zf1rJe_kx52A7xSQEUp?usp=sharing

I move the camera a few times during capture to provide different examples, but the main cobblestone area was always a "hole" in the depth map.

For reference this was the stereo output during capture of mono images from the depthai pipeline:
Screenshot 2022-09-19 at 11 53 06

@ynjiun
Copy link

ynjiun commented Sep 20, 2022

@stephansturges
Thank you for sharing the stereo images. One more thing we need is the camera to camera (stereo: S,K,D,R,T,S_rect,R_rect,P_rect matrix) calibration parameters from your unit. Thank you very much.

@stephansturges
Copy link
Author

@stephansturges Thank you for sharing the stereo images. One more thing we need is the camera to camera (stereo: S,K,D,R,T,S_rect,R_rect,P_rect matrix) calibration parameters from your unit. Thank you very much.

Is there a specific way to retrieve these with the depthai package? Or should I perform an openCV calibration to get these params?

@ynjiun
Copy link

ynjiun commented Sep 21, 2022

you could export them from your unit using depthai package, see example code here

@ynjiun
Copy link

ynjiun commented Sep 23, 2022

@stephansturges
Hi, just to confirmed the images you shared are: stereoDepth.rectifiedLeft/rectifiedRight or left/right before rectification? Please advise. Thank you.

@Erol444
Copy link
Member

Erol444 commented Sep 24, 2022

Thoughts @szabi-luxonis on the last approach (using color cam on OAK-D-* for another stereo pair)?

@stephansturges
Copy link
Author

@stephansturges Hi, just to confirmed the images you shared are: stereoDepth.rectifiedLeft/rectifiedRight or left/right before rectification? Please advise. Thank you.

These images are not rectified, as far as I can remember. I will get back to you in 48h with a new images set and all of the parameters, thanks :)

@ynjiun
Copy link

ynjiun commented Sep 25, 2022

Thoughts @szabi-luxonis on the last approach (using color cam on OAK-D-* for another stereo pair)?

theoretically "yes", but pratically there are several issues need to be resolved:

  1. global shutter vs. rolling shutter: for stereo vision, it is preferred to use global shutter camera and the center color cam is a rolling shutter
  2. synchronization: I am not sure the center color cam is hardware sync with the stereo pair or not, if not, then it might inject more disparity error then it reduce.
  3. calibration and retification: there is a need to perform 3-way stereo calibration: L-R, L-C, C-R calibration and retification. The current OAK-D-* API stack may not support this requirement.

Well, I thought about this approach before, but later abandon it because of the above three major obstacles. Perhaps you might have ideas for solving the above 3 issues? Please share. Thanks.

@SzabolcsGergely
Copy link
Collaborator

1 and 2 True. IIRC there was some work done on IMX378-OV9282 stereo a long time ago, for a customer, but didn't work out well, that's why there's no support for it, I assume.

3 L-R and C-R calibration are enough, which is already performed, from that extrinsic can be calculated for L-C.

@stephansturges
Copy link
Author

you could export them from your unit using depthai package, see example code here

These stereo images a NOT rectified.
Please find the calibration parameters for this camera below:

RGB Camera Default intrinsics...
[[816.10791015625, 0.0, 662.2203979492188], [0.0, 815.0753784179688, 396.26171875], [0.0, 0.0, 1.0]]
1280
800
RGB Camera Default intrinsics...
[[816.10791015625, 0.0, 662.2203979492188], [0.0, 815.0753784179688, 396.26171875], [0.0, 0.0, 1.0]]
1280
800
RGB Camera resized intrinsics... 3840 x 2160
[[2.44832373e+03 0.00000000e+00 1.98666113e+03]
[0.00000000e+00 2.44522607e+03 1.06878516e+03]
[0.00000000e+00 0.00000000e+00 1.00000000e+00]]
RGB Camera resized intrinsics... 4056 x 3040
[[2.58604199e+03 0.00000000e+00 2.09841089e+03]
[0.00000000e+00 2.58277026e+03 1.50815430e+03]
[0.00000000e+00 0.00000000e+00 1.00000000e+00]]
LEFT Camera Default intrinsics...
[[804.4307250976562, 0.0, 645.8418579101562], [0.0, 805.9994506835938, 394.1195983886719], [0.0, 0.0, 1.0]]
1280
800
LEFT Camera resized intrinsics... 1280 x 720
[[804.4307251 0. 645.84185791]
[ 0. 805.99945068 354.11959839]
[ 0. 0. 1. ]]
RIGHT Camera resized intrinsics... 1280 x 720
[[793.31036377 0. 649.45861816]
[ 0. 794.13549805 366.80813599]
[ 0. 0. 1. ]]
LEFT Distortion Coefficients...
k1: -9.06633186340332
k2: 64.01287078857422
p1: 0.00037014155532233417
p2: 0.004507572390139103
k3: -90.78865814208984
k4: -9.145102500915527
k5: 64.22074127197266
k6: -90.83280181884766
s1: 0.0
s2: 0.0
s3: 0.0
s4: 0.0
τx: 0.0
τy: 0.0
RIGHT Distortion Coefficients...
k1: -3.764836549758911
k2: 57.089759826660156
p1: -0.0009946062928065658
p2: 0.0036667559761554003
k3: -43.92017364501953
k4: -3.887897253036499
k5: 57.23114013671875
k6: -43.462310791015625
s1: 0.0
s2: 0.0
s3: 0.0
s4: 0.0
τx: 0.0
τy: 0.0
RGB FOV 68.7938003540039, Mono FOV 71.86000061035156
LEFT Camera stereo rectification matrix...
[[ 9.72267767e-01 3.32415744e-03 3.37760016e+01]
[-1.11641311e-02 9.85241860e-01 2.50823365e+01]
[-2.11857628e-05 -8.93426728e-08 1.01356903e+00]]
RIGHT Camera stereo rectification matrix...
[[ 9.85896693e-01 3.37381855e-03 2.13477234e+01]
[-1.13206261e-02 9.99960838e-01 7.32403233e+00]
[-2.14827378e-05 -9.06774038e-08 1.01384015e+00]]
Transformation matrix of where left Camera is W.R.T right Camera's optical center
[[ 9.99970555e-01 4.66156052e-03 6.09558960e-03 -9.00829983e+00]
[-4.66190279e-03 9.99989152e-01 4.19921271e-05 1.10289901e-02]
[-6.09532790e-03 -7.04079430e-05 9.99981403e-01 -9.86258015e-02]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Transformation matrix of where left Camera is W.R.T RGB Camera's optical center
[[ 9.99942183e-01 1.07287923e-02 6.62428502e-04 -7.58986807e+00]
[-1.07335718e-02 9.99912858e-01 7.69035192e-03 -6.37143180e-02]
[-5.79862972e-04 -7.69701786e-03 9.99970138e-01 -1.00172304e-01]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]

@ynjiun
Copy link

ynjiun commented Sep 26, 2022

@stephansturges

Great! If possible, could we capture another set of stereo rectified images in (1280x720 resolution)? Preferred with similar repetitive texture in the scene. Thanks.

@stephansturges
Copy link
Author

@stephansturges

Great! If possible, could we capture another set of stereo rectified images in (1280x720 resolution)? Preferred with similar repetitive texture in the scene. Thanks.

I've updated the folder with a new data collection, you can find the files here (collection2.zip) https://drive.google.com/drive/folders/14JB64ApZJRZm_Zf1rJe_kx52A7xSQEUp?usp=sharing

FYI I actually shot these in 1280x800 because this is the native resolution of my sensors on this device. This is the CM4 PoE device with global shutter RGB unit (OV97820 instead of the standard RGB camera.

Unfortunately it also looks from this data collection like my RGB camera is dirty or out of focus, but I'm not in the office where the camera is at the moment so I can't correct this (I'm running everything over SSH)! I will try to fix this tomorrow and report back with a new collection.

@ynjiun
Copy link

ynjiun commented Sep 27, 2022

@stephansturges

Thanks for the rectified image collection2. After cropped out the rectified border the resulting images resolution used for the test run is 1216x720. Attached below please find the cropped rectifiedLeft_60 image and predicted Distance and disparity map:
Distance map colored code: darker at closer, brighter at farther
Disparity map colored code: darker for smaller disparity, brighter for larger disparity
CM4_PoE
For actual value of predicted distance and disparity, you might download the disparity.zip and distance.zip files in .npy format:
rectified_left_disparity.zip
rectified_left_distance.zip

@stephansturges
Copy link
Author

@ynjiun

Thanks for running this test!
I'm having a hard time understanding the format and scale of the .npy files, but from what I can see it looks like you have continuous depth all across the frame with no holes.
image
(the color mapping looks bad because I'm not using the correct scale for the values I suspect)
Is you algorithm doing anything to fill gaps here? If not this is already a lot better than the output I have from the standard depthai algorithm.

@ynjiun
Copy link

ynjiun commented Sep 27, 2022

@stephansturges

the pixel value at disparity.npy is the actual predicted disparity of the pixel reference to left image.
the distance is calculated using the disparity value as below:
distance = 70 meter/disparity
To see the distance more vividly, I would recommend to scale the min-max range to 0-255 and just display it in graylevel image.

The algorithms actually uses transformer to match features extracted from a deep learning model thus no "gaps" appeared. By the way, what kind of application you are developing for? or in another words, what kind of "distance accuracy" or other requirements you are looking for?

@stephansturges
Copy link
Author

@ynjiun
Thanks for the explanation of the output, I will set up a better visualization.

The algorithms actually uses transformer to match features extracted from a deep learning model thus no "gaps" appeared. By the way, what kind of application you are developing for? or in another words, what kind of "distance accuracy" or other requirements you are looking for?

I am working with small quadcopter drones and other low-altitude UAV, and I am using the stereo depth as an additional sensing method to a neural network that is designed to detect ground-level obstacles. For this reason I am not interested in using AI-enhancements for the depth estimation because this sensing modality is destined to be kept as "deterministic" as possible while the AI component is working on RGB data, and may be enhanced with RGB+D in the future :)
You can find the AI component as a solo FOSS project here: https://github.com/stephansturges/OpenLander

Your method does seem to give great results however!

@stephansturges
Copy link
Author

@ynjiun
is your approach based on https://github.com/mli0603/stereo-transformer ? I'd be curious to try it on an actual UAV..

@stephansturges
Copy link
Author

@ynjiun

Interesting. So you use semantic segmentation for identifying "safe landing zone"? Curious: how do you generate the ground truth? manual labeling? or simulation?

All the data is synthetic, so no labeling required :)

As for the stereo method: if you're willing to share the code I'd be happy to test it!

@stephansturges
Copy link
Author

@ynjiun
Sure, my email address is my name @gmail .com ;)

@stephansturges
Copy link
Author

Anecdotally, I get much better stereoscopy out of the CM4-POE after recalibrating VS the factory calibration.

image

There is still a patch of "failed depth" in the cobblestones but much less than previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants