Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformed Depth Image - Depth value experimental setup #221

Open
nliedmey opened this issue Dec 12, 2023 · 1 comment
Open

Transformed Depth Image - Depth value experimental setup #221

nliedmey opened this issue Dec 12, 2023 · 1 comment

Comments

@nliedmey
Copy link

Hello,

to generate a 3D RGB-Coordinate from 2D RGB image plane coordinates [x,y] I encountered some insights about the transformed_depth image.

In many other issues (Example 1, Example 2), the depth value returned by the transformed_depth image is described as the euclidean distance between the object and the depth sensor. Since I came across some invalid depth measurements with this hypothesis in mind, I did some more testing.

3 objects were placed at different places on a line that lays 1m in front of the azure kinect and runs parallel towards the azure kinect Z-Wall. I did some capturing of the situation and generated transformed_depth images. According to the hypothesis, only the object that is placed in the middle of the line, orthogonal to the Z=0 wall and right in front of the Azure Kinect sensor, should return a depth vlaue of ~1000mm. Both other objects, that are placed around 30cm next to the middle object, should return larger values than ~1000mm, because their euclidean distance towards the azure kinect sensor is larger than 1m in reality.

In fact, this is not the case. All three objects returned a value of ~1000mm depth. This leads to the assumption, that the returned depth value is the distance between the object and the Z-Wall of the azure kinect.

What I am asking me now is, how to generate proper 3D coordinate sets with given [x,y] image plane coordinates and a Z-value, that is the distance towards the Z-Wall and not the true depth of the object? Is the calibration.convert_2d_to_3d() function aware of this? Because it often returned 3D values that seems skewed.
Because of this, I addapted to the pinhole model and did manual computations like this:

x_3d = (z * u - z * cx) / fx
y_3d = (z * v - z * cy) / fy

Anyway, with the "z" not being the euclidean distance towards the origin of the coordinate system, these computations do not work properly I think.

Does anyone came across comparable problems and found solutions?

@maturk
Copy link

maturk commented Jan 4, 2024

@nliedmey I think your formulas are correct. The real XYZ coordinate in camera space, given a z-depth sensor reading, would be

X = (u - cx) * Z / fx
Y = (v - cy) * Z / fy 
Z = Z

Where Z is the sensor z-depth at the pixel location (u,v) on the depth map. cx and cy are also in pixels. You could then transform this point from camera space to some world coordinates if you have the camera2world transform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants