-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reopen] Trade Action #9
Comments
Hello stevexxs, Okay I see what you mean and maybe I should have expanded more on this in my answer in #6 since it can be interpreted in a wrong way. What I meant with this:
is that, per classic MDP formulation, the order is state -> action -> reward -> next state. So for a state at t, the action the agent takes is again for the position at t. I will backtrack a bit and just describe what is going on step-by-step. Specifically, we can see at the code (file trail_env.py function step) the following order:
So, the state consists of
Let me know if there is something here that is not clear. Note that I am no longer working on this project but want to answer questions or fix any bugs reported. While re-reading the paper and going through the code again, this indeed seems a bit strange: the reward at t is based on the agent's value at t+1. Is this the issue you were referring to? |
Hi @Kostis-S-Z , Thanks so much for your detailed explaination.
if action is indeed taken at timestep t,
but the problem is get_reward() called trade(),
in trade(), trade action is stored by position t+1, not t. |
Hmm okay.
Now it makes sense why this seems strange to me, it is because initially it wasn't like that. It was indeed "reward at t is based on agent's value at t" but I changed it due to issue #3 which makes me think that my initial code was correct. As for this:
This is not true because plot_actions does not use trades to plot but short and long actions which are stored independently and correctly. |
Hi,
|
I was about to clarify my previous comment. plot_actions does not use trades to plot the actions of the agent but the variables short_actions and long_actions which are stored independently and correctly. It's used optionally to mark when a trade is made. In the end (as you can see, there are no diamond markers in the plots of the paper) this was never used because it creates a very clattered plot. Also this variable does not affect the results in the sense that the PnL is not based on this variable. But indeed it should be something that needs to be fixed, just that it is not of high importance :) |
Yes, you are right, |
Sorry but I don't agree with you @Kostis-S-Z and @stevexxs |
Hi @Kostis-S-Z and @stevexxs . I would appreciate if you could send me a feedback to my comment. Thanks. |
Okay after careful consideration and review of the code and everyone's comment on this issue, the verdict is the following: Everything is as it should be.
Let me know if you have any more thoughts on this otherwise I will close the issue in a few days. |
Thanks @Kostis-S-Z for your clarification, it makes sense.
I would really appreciate a feedback from you on these points. |
Closing this issue and transferring the discussion in #11 . |
I have the same question as issue #6,
according to your answer,
The actions are for timestep t+1 of the closed price. You observe a value at time t and then decide what action you should take next.
for example, if the period is 2 hours between each timestep,
you observe and make decision of next action, but the action will be take 2 hours later,
is my understanding right ?
If so, I think correct behavior should be like, after making the decision for next action at timestep t, the action should be taken immediately even there is a splippage, no need to wait for another period until timestep t+1, like 2 hour later.
If my understanding is wrong, please let me know, Thanks !
The text was updated successfully, but these errors were encountered: