A thinking of observation window #92

speedhawk · 2023-04-14T19:51:53Z

speedhawk
Apr 14, 2023

Hi, I am learning the 'finding and avoiding v2' implementation to improve my own project. I notice that there are two variables 'step window' and 'second window' and this is the original code for what it should be:

...
:param step_window: How many steps of observations to add in the observation window, defaults to 1
:type step_window: int, optional
:param seconds_window: How many seconds of observations to add in the observation window, defaults to 1
:type seconds_window: int, optional
...

And how the maximum and minimum vector of observation should be defined based on these two variables:

...
self.single_obs_size = len(single_obs_low)
obs_low = []
obs_high = []
for _ in range(self.step_window + self.seconds_window):
    obs_low.extend(single_obs_low)
    obs_high.extend(single_obs_high)
    self.obs_list.extend([0.0 for _ in range(self.single_obs_size)])
# Memory is used for creating the windows in get_observation()
self.obs_memory = [[0.0 for _ in range(self.single_obs_size)]
                   for _ in range((self.seconds_window * int(np.ceil(1000 / self.timestep))) +
                                  self.step_window)]
self.observation_counter_limit = int(np.ceil(1000 / self.timestep))
self.observation_counter = self.observation_counter_limit
...

According to the comment and the corresponding code, it seems that the two 'window' variable can be used to expand memory batch size or even expend the size of the two assignment vectors (obs_low and obs_high) of the space class Box(). However, I still have no idea about why it is necessary for expanding the vector size and acheiving the batch size like this. Therefore, could you please enlighten me on the correct perceptive to construct these two variables and deploy them? Many thanks!

Answered by tsampazk

Apr 14, 2023

The link points to a docstring, i.e. some comments regarding the topic of discussion, so feel free to read it before delving into understanding the code which is a bit convoluted to be honest.

Well a new observation is created and passed to the agent for every timestep so if the controller timestep is set to 32, this is done every 32ms. So every 32 ms, for step_window=2 you get the current values and the values from 32 ms ago (i.e. the previous current values) and if, in addition, you have set the seconds_window=1, you would get values from a around a second ago, which means around ~32 steps ago, as 32ms * 32 steps = 1024 ms (this is why i do int(np.ceil(1000 / self.timestep) in the memor…

View full answer

tsampazk · 2023-04-14T20:21:16Z

tsampazk
Apr 14, 2023
Maintainer

Hello again @speedhawk! This piece of code is subject to some refactoring to be honest, as it totally breaks in some experiments i did with large timesteps of 1000ms etc.

I am not sure what you mean by batch size in this context. These two window variables are in place to avoid having to use recurrent architectures in the agent, and giving a simple feed-forward agent some memory by adding older observations in its input. Changing any of the two window variables changes the problem pretty fundamentally as it modifies the observation space (i.e. the model input), and that's why the space Box is set accordingly.

In case you missed it, you can take a look at this docstring that explains the windows a bit more.

Initially, through experimenting i noticed that just adding consecutive observations (step_window) in a single observation doesn't really provide much more information to the agent. With step_window set to 2, the agent gets the latest values and the values from the previous step, theoretically giving it some information on how the values change over time. This didn't seem to improve the results that much.

That's why i added the seconds_window which instead of adding values from the immediate previous step, it adds values from the previous second. If for example the current time is 00:32:44 (hours:minutes:seconds) and step_window=1, seconds_window=1, the agent will get the latest values and the values from 00:32:43.

I hope this makes more sense! I will again convert this issue to a discussion 😄

3 replies

speedhawk Apr 14, 2023
Author

Hello again @speedhawk! This piece of code is subject to some refactoring to be honest, as it totally breaks in some experiments i did with large timesteps of 1000ms etc.

I am not sure what you mean by batch size in this context. These two window variables are in place to avoid having to use recurrent architectures in the agent, and giving a simple feed-forward agent some memory by adding older observations in its input. Changing any of the two window variables changes the problem pretty fundamentally as it modifies the observation space (i.e. the model input), and that's why the space Box is set accordingly.

In case you missed it, you can take a look at this docstring that explains the windows a bit more.

Initially, through experimenting i noticed that just adding consecutive observations (step_window) in a single observation doesn't really provide much more information to the agent. With step_window set to 2, the agent gets the latest values and the values from the previous step, theoretically giving it some information on how the values change over time. This didn't seem to improve the results that much.

That's why i added the seconds_window which instead of adding values from the immediate previous step, it adds values from the previous second. If for example the current time is 00:32:44 (hours:minutes:seconds) and step_window=1, seconds_window=1, the agent will get the latest values and the values from 00:32:43.

I hope this makes more sense! I will again convert this issue to a discussion 😄

Hi, I very appreciate you replying so fast😄. I have not read the code you recommended me yet and I will read it later. But according to your answers, I think it is indicated that the crucial purpose of the two variables is to enrich the observation information by not only expending the scale but also the extraction frequency?
Just as your example, consider two conditions including one only has step_window and the other one has both two windows. Because of the first condition got only step=1 in which there are 32ms s.t. the next observation can only be extracted after 32ms. On the other hand, if we got another method based on not only step but also 'second', the observation value during the timespan of one step can also be extracted. Is my understand correct? It seems just like the "frame pre-processing" of the images in Atari-games. 😄😄

tsampazk Apr 14, 2023
Maintainer

The link points to a docstring, i.e. some comments regarding the topic of discussion, so feel free to read it before delving into understanding the code which is a bit convoluted to be honest.

Well a new observation is created and passed to the agent for every timestep so if the controller timestep is set to 32, this is done every 32ms. So every 32 ms, for step_window=2 you get the current values and the values from 32 ms ago (i.e. the previous current values) and if, in addition, you have set the seconds_window=1, you would get values from a around a second ago, which means around ~32 steps ago, as 32ms * 32 steps = 1024 ms (this is why i do int(np.ceil(1000 / self.timestep) in the memory list initialization).

The following might give you a better understanding:

v(alues) = [distance to target, angle to target, distance sensor values, etc.]

We get new set of values X at timestep X:
t0 -> v0
t1 -> v1
...
t32 -> v32

Observations constructed depending on windows:

o(bservation) X with step_window=1, seconds_window=0:
o0 -> [v0]
o1 -> [v1]
...
o32 -> [v32]
observation with step_window=2, seconds_window=0:
o0 -> [v0, zeros]
o1 -> [v1, v0]
o2 -> [v2, v1]
....
o32 -> [v32, v31]
observation with step_window=2, seconds_window=1:
o0 -> [v0, zeros]
o1 -> [v1, v0, zeros]
o2 -> [v2, v1, zeros]
....
o32 -> [v32, v31, v0]
o33 -> [v33, v32, v1]
o34 -> [v34, v33, v2]

You can extrapolate from here that for seconds_window=2 you would get one additional set of values from ~64 steps ago, and so on.

Answer selected by speedhawk

tsampazk Apr 14, 2023
Maintainer

Might have an off-by-one error in the indices :P

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A thinking of observation window #92

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

A thinking of observation window #92

speedhawk Apr 14, 2023

Replies: 1 comment · 3 replies

tsampazk Apr 14, 2023 Maintainer

speedhawk Apr 14, 2023 Author

tsampazk Apr 14, 2023 Maintainer

tsampazk Apr 14, 2023 Maintainer

speedhawk
Apr 14, 2023

Replies: 1 comment 3 replies

tsampazk
Apr 14, 2023
Maintainer

speedhawk Apr 14, 2023
Author

tsampazk Apr 14, 2023
Maintainer

tsampazk Apr 14, 2023
Maintainer