Quesetions about the evaluation and the details of using SAC algorithm as the backbone for PEX. #2

whalewang410 · 2023-12-12T10:18:21Z

Thanks for releasing the codes. I have 2 questions about the algorithm implementation.

The PEX codes are building on the IQL(offline and online). When online fine-tuning, the algorithm uses the "dist.sample" to choose w and action_2 for interaction with environment. But I want to know when evaluating, why do you choose epsilon-greedy to choose w and action_2 instead of purely greedy operation.
I am going to use the SAC as the backbone algorithm for EPX like you did in the paper. Due to the lack of SAC version, I want to know how to transfer offline training Q to online SAC. Because the Q of SAC incorporating the entropy term，I don't know if it is reasonable to directly use the offline training Q as the SAC's Q and use the soft bellman equation to update the Q.
Furthermore, I don't understand the adaption for the SAC actor training, showing in the picture below. Could you give me more descriptions about the adaption?

Provide feedback