Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another implementation of Randomized #232

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from
Draft

Another implementation of Randomized #232

wants to merge 32 commits into from

Conversation

wasowski
Copy link
Member

@wasowski wasowski commented Jan 12, 2024

@mohsen-ghaffari1992 in a push to try to resolve bug #59, I have proposed to replace Randomized with Randomized2 based on probula. This has actually uncovered some design flaw with Randomized (it was not properly abstracted so its design details were leaking to symsim code). As a result this required a massive amount of changes.

I had no time to try the entire test suite, so I do not even know if this is correct (and it may have positive or negative impact on convergence)

But simple maze seems to work, and I was able to run a million episodes. So try using this branch for your experiments.

@wasowski wasowski marked this pull request as draft January 12, 2024 23:31
@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024

@wasowski
Thanks!
I tested a maze 100x100 for 1000 episodes with timeout=2000, it perfectly is running. I still do not know how good is the result but it is very hopeful that now we can run both long trials and high number of episodes.

@wasowski
Copy link
Member Author

@mohsen-ghaffari1992
there are many tests failing so I am trying to chase bugs and push corrections to this branch. I think we will merge this branch after the deadline.

It should only be created in top-level executable files (which in this
project means Spec files)
@wasowski
Copy link
Member Author

concrete.simplebandit.Bandit is an Agent.agent.observe (step (s) (a)._1)  ObservableState *** FAILED *** (44 milliseconds)
[info]   ArrayIndexOutOfBoundsException was thrown during property evaluation.

Gaussian Simple Bandit seems to fail because of a bug in spire (issue #233). Do we need Gaussian for the paper anywhere @mohsen-ghaffari1992 ?

@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

@wasowski
Copy link
Member Author

wasowski commented Jan 13, 2024

OK. Then we fix this later.

Now cartpole was failing because of an assertion, but as far as I can see the assertion was wrong (the last requirement for state invariant was

require (pv <= PvMin)

I switch this to PvMax and now seems to work, although very strange how this mistake survived so long.

Now cartpole still uses memory intensively. Even with 10K episodes I am running out of memory.

  1. Do we use cartpole in the paper?
  2. With what number of episodes?
  3. Did it work before or also timed out?

So that we can diagnostic information in more situations, not only when
moving (it was previously placed in move)
It was not excluding the right point margin as the spec would indicate
It seems that it was forgetting initializing at the max edges of the board
@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

@wasowski
Copy link
Member Author

but I think there is something buggy, which means me question on all results. WindyGrid and CliffWalking are also not able to complete even one episode, and these should work just fine! CliffWalking works very well in ADPRO on an extremely similar design, so something is fishy.

@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

@wasowski
Copy link
Member Author

wasowski commented Jan 13, 2024

Hmm... this is weird. The test for validity (CliffWalkingIsAgent) is passing, and we only produce states within the board? Also I moved the assertions about the states being legal from move to the constructor in CWState, and it never fails , which should indicate that these states are all good? How can you see that this is happening?

(I suspected the dreaded function tailRecM in Randomized2, but looking at it for several hours, I start to believe that it is correct).

@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

@wasowski
Copy link
Member Author

wasowski commented Jan 13, 2024

But maybe I do not understand what valid states are.

  // from CliffWalking
  def initialize: Randomized2[CWState] = { for
    x <- Randomized2.between (0, BoardWidth  + 1)
    y <- Randomized2.between (0, BoardHeight + 1)
    s = CWState (x, y) 
  yield s }.filter { !this.isFinal (_) }

I do not understand how this can create states outside the board. Also I have assertions now checking when the state is created in CWState, so if I try to create an illegal state everything crashes with an exception, but I do not see these exceptions. For example:

scala> CWState(20,20)
java.lang.IllegalArgumentException: requirement failed: Out-Of-Width x: ¬(20  11)
  at scala.Predef$.require(Predef.scala:337)
  at symsim.examples.concrete.cliffWalking.CWState.<init>(CliffWalking.scala:18)
  at symsim.examples.concrete.cliffWalking.CWState$.apply(CliffWalking.scala:16)
  ... 42 elided

So I think this is not happening. We already had this problem that everything was hanging during the first episode (or it seems, after some debugging that this might be during the second episode). But I do not remember what was the reason.

@wasowski
Copy link
Member Author

Oh - some progress. I just discovered that I was misguided. It is the evaluation that is hanging, not learning! I have not been looking at the evaluation code, as I was sure it was learning. I will look into eval. It might be the same problem as with maze, that the early policies are bad and they need a timeout.

@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

@wasowski
Copy link
Member Author

Now we check it every time (in cliffwalking) when a new state is constructed. So this checks both when moving and when initializing. I will look into eval.

@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

@wasowski
Copy link
Member Author

OK. I added time horizon to mountaincar, cliffwalking, and windygrid. Now everything seems to behave. I think this is ready for your experiments. I hope this branch will allow all the experiments you need. I definitely hit heap problems still, but I believe we can handle larger cases now.

Some tests are still failing (about 10). All of them are due to randomness or the spire bug mentioned above (with Gaussian). I will stop now and move to other things.

@mohsen-ghaffari1992
Copy link
Collaborator

mohsen-ghaffari1992 commented Jan 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants