-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about two stage training in PETR #11
Comments
In PETR, they used randomly initialized queries without query positional encoding from reference points. So, section 3.3 seems correct with the code of this repo. |
while two-stage deformable DETR embedded their queries with their initial bboxes. |
Thank you for your answer! I check the code and i think that, the two-stage mode (default setting in the code) denotes that the initial reference points for the decoder are initialized from the top 100 proposals, while the query embedding vectors are still randomly initialized. I check the code in This setting is very close to the recent DINO, where they only use positional information from encoder proposals and use randomly initialized content vectors for query. DINO says this will yield better performance. |
Hence, I think there is still some difference between section 3.3, where the locations of initial reference points are randomly initialized and learned. This setting is very close to the recent DINO, where they only use positional information from encoder proposals and use randomly initialized content vectors for queries. DINO says this will yield better performance. Thank you for your feedback :) |
I check the code and I think that the two-stage mode (default setting in the code) denotes that the initial reference points for the decoder are initialized from the top 100 proposals, while the query embedding vectors are still randomly initialized. -> Yes, I think so too! |
The DINO uses mixed query selection that set initial reference points as content queries and uses randomly initialized positional encoding while this repo set randomly initialized values as content queries also. -> I have not finished reading DINO's code. But I think the idea of only passing position information of proposals are quite similar here. Thank you very much! |
Thank you for sharing the code! I notice that, PETR is set to be two-stage in the code, i.e., top K proposals from encoder output are selected as the query embedding as well as the initial reference point in the decoder. This is also very similar to the two-stage version of Deformable-DETR.
However, in section 3.3 of the paper, the authors mentioned that the query embeddings are randomly initialized and learnt, which is not a two-stage way. I wonder if the reported results are from two-stage models or one-stage ones. Besides, how much improvement can the two-stage variant bring?
The text was updated successfully, but these errors were encountered: