-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] Robust Multiple-Baseline Stereo matching for robust stereoscopy in repetitive-pattern environments (RMBS-stereo) #675
Comments
@stephansturges |
Thanks for offering to do the test! I move the camera a few times during capture to provide different examples, but the main cobblestone area was always a "hole" in the depth map. For reference this was the stereo output during capture of mono images from the depthai pipeline: |
@stephansturges |
Is there a specific way to retrieve these with the depthai package? Or should I perform an openCV calibration to get these params? |
you could export them from your unit using depthai package, see example code here |
@stephansturges |
Thoughts @szabi-luxonis on the last approach (using color cam on OAK-D-* for another stereo pair)? |
These images are not rectified, as far as I can remember. I will get back to you in 48h with a new images set and all of the parameters, thanks :) |
theoretically "yes", but pratically there are several issues need to be resolved:
Well, I thought about this approach before, but later abandon it because of the above three major obstacles. Perhaps you might have ideas for solving the above 3 issues? Please share. Thanks. |
1 and 2 True. IIRC there was some work done on IMX378-OV9282 stereo a long time ago, for a customer, but didn't work out well, that's why there's no support for it, I assume. 3 L-R and C-R calibration are enough, which is already performed, from that extrinsic can be calculated for L-C. |
These stereo images a NOT rectified. RGB Camera Default intrinsics... |
Great! If possible, could we capture another set of stereo rectified images in (1280x720 resolution)? Preferred with similar repetitive texture in the scene. Thanks. |
I've updated the folder with a new data collection, you can find the files here (collection2.zip) https://drive.google.com/drive/folders/14JB64ApZJRZm_Zf1rJe_kx52A7xSQEUp?usp=sharing FYI I actually shot these in 1280x800 because this is the native resolution of my sensors on this device. This is the CM4 PoE device with global shutter RGB unit (OV97820 instead of the standard RGB camera. Unfortunately it also looks from this data collection like my RGB camera is dirty or out of focus, but I'm not in the office where the camera is at the moment so I can't correct this (I'm running everything over SSH)! I will try to fix this tomorrow and report back with a new collection. |
Thanks for the rectified image collection2. After cropped out the rectified border the resulting images resolution used for the test run is 1216x720. Attached below please find the cropped rectifiedLeft_60 image and predicted Distance and disparity map: |
the pixel value at disparity.npy is the actual predicted disparity of the pixel reference to left image. The algorithms actually uses transformer to match features extracted from a deep learning model thus no "gaps" appeared. By the way, what kind of application you are developing for? or in another words, what kind of "distance accuracy" or other requirements you are looking for? |
@ynjiun
I am working with small quadcopter drones and other low-altitude UAV, and I am using the stereo depth as an additional sensing method to a neural network that is designed to detect ground-level obstacles. For this reason I am not interested in using AI-enhancements for the depth estimation because this sensing modality is destined to be kept as "deterministic" as possible while the AI component is working on RGB data, and may be enhanced with RGB+D in the future :) Your method does seem to give great results however! |
@ynjiun |
Interesting. So you use semantic segmentation for identifying "safe landing zone"? Curious: how do you generate the ground truth? manual labeling? or simulation? All the data is synthetic, so no labeling required :) As for the stereo method: if you're willing to share the code I'd be happy to test it! |
Start with Why?
When using stereo devices "in the wild" on human-made objects it is extremely common to encounter repetitive patterns on the objects that you want to retrieve depth from. Brick walls, cobblestone roads, roof shingles, tiling etc... these often constitute the majority of a scene in an urban environment.
Unfortunately, stereo matching as implemented in DepthAI is not particularly good at retrieving depth from these types of patterns, for reasons that are well documented in the history of development of stereoscopy algorithms. See here for more information:
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.174.152&rep=rep1&type=pdf
This can be observed with the current Luxonis devices, such as in this scene where a CM4-PoE device is pointed down at cobblestones from a height of approx 10m. Notice the depth estimation is inconclusive in the area that shows the most repetition. In the current state of the DepthAI library this cannot be solved by tuning parameters in the stereo matching cost function.
(see more examples here: https://discuss.luxonis.com/d/875-depth-parameters-configuration-testing-code/4 )
How can this be solved?
There are different approaches to the solution, but the most promising seems to be using multiple-baseline cameras. See this paper for one implementation and details:
https://www.researchgate.net/publication/3916660_A_robust_stereo-matching_algorithm_using_multiple-baseline_cameras
This has already been mentioned also in the context of retrieving additional information from the planned Long-Range device by @ynjiun in this thread on the hardware: luxonis/depthai-hardware#247
In the context of DepthAI the implementation would require using the RGB camera on OAK-D devices in a desaturated mode as a third mono source, and use it to calculate additional disparity maps with one or both of the Mono sensors to refine the local variance value for each pixel in a sliding window.
It would be very useful for the type of work I am doing to see an evolution to this type of stereo matching to solve the problem of repeating textures, and I'm sure it would benefit many in the community!
The text was updated successfully, but these errors were encountered: