Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quesetions about the evaluation and the details of using SAC algorithm as the backbone for PEX. #2

Open
whalewang410 opened this issue Dec 12, 2023 · 0 comments

Comments

@whalewang410
Copy link

whalewang410 commented Dec 12, 2023

Thanks for releasing the codes. I have 2 questions about the algorithm implementation.

  1. The PEX codes are building on the IQL(offline and online). When online fine-tuning, the algorithm uses the "dist.sample" to choose w and action_2 for interaction with environment. But I want to know when evaluating, why do you choose epsilon-greedy to choose w and action_2 instead of purely greedy operation.
  2. I am going to use the SAC as the backbone algorithm for EPX like you did in the paper. Due to the lack of SAC version, I want to know how to transfer offline training Q to online SAC. Because the Q of SAC incorporating the entropy term,I don't know if it is reasonable to directly use the offline training Q as the SAC's Q and use the soft bellman equation to update the Q.
  3. Furthermore, I don't understand the adaption for the SAC actor training, showing in the picture below. Could you give me more descriptions about the adaption?
    1702384663551
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant