Representation of the condition #10

masashi-hatano · 2024-12-11T15:54:33Z

Hello, first I appreciate your great work.

I have a minor question about the representation of the conditions.
In prior works like EgoEgo, the 6D representation is adopted for the rotation matrix, and the entire matrix with translation (3x4) is represented in a vector of 9 dimensions (6+3).
Is there any specific reason that you didn't follow this, and adopted all elements of the entire matrix concatenated with the Fourier encoded of the same matrix?

Thanks in advance.

brentyi · 2024-12-11T18:34:08Z

Hi Masashi!

To be honest this is just a minor personal taste thing, basically an aesthetic choice. I'd be surprised if it matters in practice.

As network input: the 6D vs 9D rotation representation have the same smoothness/uniqueness advantages compared to quaternions, axis-angle, etc. 6D is more compact but to me that seems more likely to hurt than help learning.
- Fourier encoding: this was just a thing I tried because I thought it made sense (relevant paper). I saw the loss go down so I kept it.
As network output: at test time for either representation we need to project to an orthogonal matrix. For 6D the standard is Gram-Schmidt, for 9D we use SVD. I like the latter better because (1) Gram-Schmidt requires an "axis ordering" that will impact results and feels arbitrary and (2) there's a symmetry between SVD (at test time) being the L2-optimal projection and our loss (at train time) being L2 that feels simpler/more correct/more elegant.
- The downside of SVD is it's slower, but this isn't a significant bottleneck for us.

The 9D representation + SVD also requires a little bit less code.

masashi-hatano · 2024-12-11T19:12:58Z

Thank you so much!

Fourier encoding seems very interesting, thanks for sharing the paper.

By the way, Eq.(7) in the paper seems wrong if I don't misunderstand. I think $R_{\text{world}, \text{cpf}}^{t}$ is unnecessary as this won't keep aligning the canonical frame's local z-axis parallel to the world z-axis. Just the first term $R_{z}(.)$ represents the rotation from world to canonical.

brentyi · 2024-12-11T19:44:44Z

By the way, Eq.(7) in the paper seems wrong if I don't misunderstand. I think R world , cpf t is unnecessary as this won't keep aligning the canonical frame's local z-axis parallel to the world z-axis. Just the first term R z ( . ) represents the rotation from world to canonical.

You're totally right! I'll fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Representation of the condition #10

Representation of the condition #10

masashi-hatano commented Dec 11, 2024

brentyi commented Dec 11, 2024

masashi-hatano commented Dec 11, 2024

brentyi commented Dec 11, 2024

Representation of the condition #10

Representation of the condition #10

Comments

masashi-hatano commented Dec 11, 2024

brentyi commented Dec 11, 2024

masashi-hatano commented Dec 11, 2024

brentyi commented Dec 11, 2024