You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the MuZero issues and found no similar bug report.
🐛 Describe the bug
I'm confuse about this one: In muzero paper, the input to the dynamics function is the hidden state concatenated with a representation of the action for the transition. The problem is the code is different from paper describe which i understand, a normal action (playing a stone on the board) is encoded as an all zero plane, with a single one in the position of the played stone. For example, if action_space_size = 5, action =2, an action could encoded [0,1,0,0,0]. But in this code, the action is encoded [0.4,0.4,0.4,0.4,0.4],which is action/action_space_size.
I'm confuse about this place,am i misunderstanding? please tell me which one is right, and why write like this, thanks.
Search before asking
🐛 Describe the bug
I'm confuse about this one: In muzero paper, the input to the dynamics function is the hidden state concatenated with a representation of the action for the transition. The problem is the code is different from paper describe which i understand, a normal action (playing a stone on the board) is encoded as an all zero plane, with a single one in the position of the played stone. For example, if action_space_size = 5, action =2, an action could encoded [0,1,0,0,0]. But in this code, the action is encoded [0.4,0.4,0.4,0.4,0.4],which is action/action_space_size.
I'm confuse about this place,am i misunderstanding? please tell me which one is right, and why write like this, thanks.
Add an example
action_one_hot = (
torch.ones(
(
encoded_state.shape[0],
1,
encoded_state.shape[2],
encoded_state.shape[3],
)
)
.to(action.device)
.float()
)
action_one_hot = (
action[:, :, None, None] * action_one_hot / self.action_space_size
)
Environment
No response
Minimal Reproducible Example
No response
Additional
No response
The text was updated successfully, but these errors were encountered: