Skip to content

Final Experiments To Do

Tonio Weidler edited this page Jan 21, 2019 · 33 revisions

Pre-Experiment:

Comparison of DDQN with PrioritizedDDQN

If there are no significant differences in performance we will only use a DDQN with prioritized memory to reduce the number of total experiments to run.

Task: Tunnel

Representation-learner = Flatten

Parameters:

memory_delay = 0

init_eps = 1.0

memory_eps = 0.8

min_eps = 0.01

eps_decay = 20000

Policy-learner Score Training-time (Note: not on same machine)
DDQN
PrioDDQN 199.1 63min

I also ran experiments with CartPole using all three architectures and 10.000 Episodes.

Policy-learner Score Training-time»
DDQN 200 111.63min
PrioDDQN 114.6 72.93min
PrioDuelingDQN 56.9 27min (mostly due to the bad performance)

Final Experiments:

Policy-learners: DDQN

Representation-learners: ConvolutionalJanusAE, ConvolutionalCerberusAE, ConvolutionalVariationalAE

Tasks: (Tunnel, Evasion, EvasionWalls), Tunnel, (4 Pathing-tasks)

Parameters:

Train for 1.000.000 Episodes

n_hidden=32

memory_delay = 100000

init_eps = 1.0

memory_eps = 0.8

min_eps = 0.01

eps_decay = 40000000

ID Policy-learner Representation-learner Task Running Done Score On Colab Actively running on Cluster Progress
1 DDQN ConvJanus (Tunnel, Evasion, EvasionWalls) [x] [] Danni(Local) 320K as of 18.00 Sun.
2 DDQN ConvJanus Tunnel [x] [] Kevin 330k as of 13.00 Sun
3 DDQN ConvCerberus (Tunnel, Evasion, EvasionWalls) [x] [] Adrian till ~23.00 Tue. 230K as of 18.00 Sun.
4 DDQN ConvCerberus Tunnel [] [] Tonio
5 DDQN ConvVAE (Tunnel, Evasion, EvasionWalls) [x] [] Danni(Azure) 320K as of 18.00 Sun.
6 DDQN ConvVAE Tunnel [x] [] Alessandro 220k as of 19.00 Sun.
7 DDQN ConvJanus (4 Pathing) [x] [] till ~ 10.30 Mon. 23K as of 18.00 Sun.
8 DDQN ConvCerberus (4 Pathing) [] [] queuing
9 DDQN ConvVAE (4 Pathing) [] []
10 DDQN ConvVAE (Scrollers) only update DDQN [] []

Additional Experiments to show in the Report

Isolated Module Experiments
  • representation module performance on its own
    • single task
    • multi task
  • DDQN performance on its own