Question about '3D joint position y' in Projective Attention #30

Billccx · 2023-11-09T10:19:52Z

Hello!

I'm wondering what y refers to in the Projective Attention module.

Is it the ground truth of the 3D keypoints? If so, how do we handle it when the model is inferencing?

I'm looking forward to your reply and thanks in advance ~

The text was updated successfully, but these errors were encountered:

twangnh · 2023-11-09T14:21:40Z

hi it means the 3d joint location prediction of the current decoder layer

Billccx · 2023-11-09T17:42:40Z

hi it means the 3d joint location prediction of the current decoder layer

Thanks for your quick reply !

I have checked the source code. It seems that reference_points is y ?

Does this mean that y is actually the reference_points output by the previous decoder layer?

mvp/lib/models/mvp_decoder.py

Lines 314 to 345 in 8b2ccc5

    
           def forward(self, tgt, reference_points, src_views, 
        
                       src_views_with_rayembed, meta, src_spatial_shapes, 
        
                       src_level_start_index, src_valid_ratios, 
        
                       query_pos=None, src_padding_mask=None): 
        
               output = tgt 
        
               intermediate = [] 
        
               intermediate_reference_points = [] 
        
               for lid, layer in enumerate(self.layers): 
        
                   reference_points_input = reference_points[:, :, None] 
        
                   output = layer(output, query_pos, reference_points_input, 
        
                                  src_views, src_views_with_rayembed, 
        
                                  src_spatial_shapes, 
        
                                  src_level_start_index, meta, src_padding_mask) 
        
                   # hack implementation for iterative pose refinement 
        
                   if self.pose_embed is not None: 
        
                       tmp = self.pose_embed[lid](output) 
        
                       new_reference_points = tmp + inverse_sigmoid(reference_points) 
        
                       new_reference_points = new_reference_points.sigmoid() 
        
                       reference_points = new_reference_points.detach() 
        
                   if self.return_intermediate: 
        
                       intermediate.append(output) 
        
                       intermediate_reference_points.append(reference_points) 
        
               if self.return_intermediate: 
        
                   return torch.stack(intermediate), \ 
        
                          torch.stack(intermediate_reference_points) 
        
               return output, reference_points

I am not sure if my understanding is correct. If it is not, could you please explain it in detail? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about '3D joint position y' in Projective Attention #30

Question about '3D joint position y' in Projective Attention #30

Billccx commented Nov 9, 2023

twangnh commented Nov 9, 2023

Billccx commented Nov 9, 2023

Question about '3D joint position y' in Projective Attention #30

Question about '3D joint position y' in Projective Attention #30

Comments

Billccx commented Nov 9, 2023

twangnh commented Nov 9, 2023

Billccx commented Nov 9, 2023