Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation of the condition #10

Open
masashi-hatano opened this issue Dec 11, 2024 · 3 comments
Open

Representation of the condition #10

masashi-hatano opened this issue Dec 11, 2024 · 3 comments

Comments

@masashi-hatano
Copy link

Hello, first I appreciate your great work.

I have a minor question about the representation of the conditions.
In prior works like EgoEgo, the 6D representation is adopted for the rotation matrix, and the entire matrix with translation (3x4) is represented in a vector of 9 dimensions (6+3).
Is there any specific reason that you didn't follow this, and adopted all elements of the entire matrix concatenated with the Fourier encoded of the same matrix?

Thanks in advance.

@brentyi
Copy link
Owner

brentyi commented Dec 11, 2024

Hi Masashi!

To be honest this is just a minor personal taste thing, basically an aesthetic choice. I'd be surprised if it matters in practice.

  • As network input: the 6D vs 9D rotation representation have the same smoothness/uniqueness advantages compared to quaternions, axis-angle, etc. 6D is more compact but to me that seems more likely to hurt than help learning.
    • Fourier encoding: this was just a thing I tried because I thought it made sense (relevant paper). I saw the loss go down so I kept it.
  • As network output: at test time for either representation we need to project to an orthogonal matrix. For 6D the standard is Gram-Schmidt, for 9D we use SVD. I like the latter better because (1) Gram-Schmidt requires an "axis ordering" that will impact results and feels arbitrary and (2) there's a symmetry between SVD (at test time) being the L2-optimal projection and our loss (at train time) being L2 that feels simpler/more correct/more elegant.
    • The downside of SVD is it's slower, but this isn't a significant bottleneck for us.

The 9D representation + SVD also requires a little bit less code.

@masashi-hatano
Copy link
Author

Thank you so much!

Fourier encoding seems very interesting, thanks for sharing the paper.

By the way, Eq.(7) in the paper seems wrong if I don't misunderstand. I think $R_{\text{world}, \text{cpf}}^{t}$ is unnecessary as this won't keep aligning the canonical frame's local z-axis parallel to the world z-axis. Just the first term $R_{z}(.)$ represents the rotation from world to canonical.

@brentyi
Copy link
Owner

brentyi commented Dec 11, 2024

By the way, Eq.(7) in the paper seems wrong if I don't misunderstand. I think R world , cpf t is unnecessary as this won't keep aligning the canonical frame's local z-axis parallel to the world z-axis. Just the first term R z ( . ) represents the rotation from world to canonical.

You're totally right! I'll fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants