Skip to content

Documentation

Fabian-Sommer edited this page Dec 19, 2017 · 1 revision

Timeline

Start with score as fitness function, many simple platforms, moving through walls enabled. 1 hidden layer in NN. Output layer size 2. Multiple platforms as input, each platform with delta y and delta x.

Added same random seed for each generation so that all players play the same game, making their fitness comparable.

Disabled going through walls (led to a local optimum where players would always go right and hit some platforms on the way with high probability).

Added vertical speed to input.

Added cause of death to fitness function, penalizing dieing through wall and by no progress.

Refactored input so that the chosen input platforms are only changed whenever the player lands on a platform. Players are learning small jumps.

Significantly reduced number of platforms. Only one reachable platform at any time, which is the only input platform.

Players can also learn long jumps now after a long time.

Changed mutation function to keep best players from previous generation.

Removed cause-of-death penalties from fitness function, adding time instead (see fitness functions).

Reenabled moving through walls - results ???

Fitness functions

Experiments done with one input platform, neural net of 4-20-3, keeping top 4 players after each generation, no movement through walls.

score/time Fitness is almost constant, no visible learning could be observed.

score - time

NN finds a local optimum where it kills itself as quickly as possible by always going right/left.

max(score - time, 0)

Still no visible learning.

max(score^2 - time, 0)

Fitness function is dominated by score, time serves as tiebraker for players with equal score.

Run 1: Almost perfect player developed after 800 generations

Run 2: Steady progress to about 500 score after 200 generations, then slow but somewhat steady progress.

Run 3: Stagnating on 50 score until generation 200, then sudden jump to 500 score. Then slow but steady progress. At some point degradation to beginning. Why?

max(0.8norm(score) - 0.2norm(time), 0)

Best player should always have fitness of 0.6. Not good for comparing fitness across generations (keeping best player), but might do well otherwise.

Clone this wiki locally