-
Hi, I am learning the 'finding and avoiding v2' implementation to improve my own project. I notice that there are two variables 'step window' and 'second window' and this is the original code for what it should be:
And how the maximum and minimum vector of observation should be defined based on these two variables:
According to the comment and the corresponding code, it seems that the two 'window' variable can be used to expand memory batch size or even expend the size of the two assignment vectors (obs_low and obs_high) of the space class Box(). However, I still have no idea about why it is necessary for expanding the vector size and acheiving the batch size like this. Therefore, could you please enlighten me on the correct perceptive to construct these two variables and deploy them? Many thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello again @speedhawk! This piece of code is subject to some refactoring to be honest, as it totally breaks in some experiments i did with large timesteps of 1000ms etc. I am not sure what you mean by batch size in this context. These two window variables are in place to avoid having to use recurrent architectures in the agent, and giving a simple feed-forward agent some memory by adding older observations in its input. Changing any of the two window variables changes the problem pretty fundamentally as it modifies the observation space (i.e. the model input), and that's why the space Box is set accordingly. In case you missed it, you can take a look at this docstring that explains the windows a bit more. Initially, through experimenting i noticed that just adding consecutive observations (step_window) in a single observation doesn't really provide much more information to the agent. With step_window set to 2, the agent gets the latest values and the values from the previous step, theoretically giving it some information on how the values change over time. This didn't seem to improve the results that much. That's why i added the seconds_window which instead of adding values from the immediate previous step, it adds values from the previous second. If for example the current time is 00:32:44 (hours:minutes:seconds) and step_window=1, seconds_window=1, the agent will get the latest values and the values from 00:32:43. I hope this makes more sense! I will again convert this issue to a discussion 😄 |
Beta Was this translation helpful? Give feedback.
The link points to a docstring, i.e. some comments regarding the topic of discussion, so feel free to read it before delving into understanding the code which is a bit convoluted to be honest.
Well a new observation is created and passed to the agent for every timestep so if the controller timestep is set to 32, this is done every 32ms. So every 32 ms, for step_window=2 you get the current values and the values from 32 ms ago (i.e. the previous current values) and if, in addition, you have set the seconds_window=1, you would get values from a around a second ago, which means around ~32 steps ago, as 32ms * 32 steps = 1024 ms (this is why i do
int(np.ceil(1000 / self.timestep)
in the memor…