You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for releasing the codes. I have 2 questions about the algorithm implementation.
The PEX codes are building on the IQL(offline and online). When online fine-tuning, the algorithm uses the "dist.sample" to choose w and action_2 for interaction with environment. But I want to know when evaluating, why do you choose epsilon-greedy to choose w and action_2 instead of purely greedy operation.
I am going to use the SAC as the backbone algorithm for EPX like you did in the paper. Due to the lack of SAC version, I want to know how to transfer offline training Q to online SAC. Because the Q of SAC incorporating the entropy term,I don't know if it is reasonable to directly use the offline training Q as the SAC's Q and use the soft bellman equation to update the Q.
Furthermore, I don't understand the adaption for the SAC actor training, showing in the picture below. Could you give me more descriptions about the adaption?
The text was updated successfully, but these errors were encountered:
Thanks for releasing the codes. I have 2 questions about the algorithm implementation.
The text was updated successfully, but these errors were encountered: