- agent: explore/exploit methods does not return population
- agent trial numbering starts from 1 (not 0)
- yacs: prevention iterating empty expected improvements when specializing condition
- epsilon value is drawn from uniform distribution
- epsilon value supports values from [0, 1]
- move the logic for selecting best action into each agent
- Use OpenAI Gym ObservationWrapper interface for transforming environment input
- dropped support for
environment_adapter
configuration option