In this homework, you will use a neural network to learn a parameterize policy that can select action without consulting a value function. A value function may still be used to learn the policy weights, but is not required for action selection.
There are some advantage of the policy-based algorithms:
- Policy-based methods also offer useful ways of dealing with continuous action spaces
- For some tasks, the policy function is simpler and thus easier to approximate.
We will use CartPole-v0
as environment in this homework. The following gif is the visualization of the CartPole:
For further description, please see here
- Python 3.5.3
- OpenAI gym
- tensorflow
- numpy
- matplotlib
- ipython
We encourage you to install Anaconda or Miniconda in your laptop to avoid tedious dependencies problem.
for lazy people:
conda env create -f environment.yml
source activate cedl
# deactivate when you want to leave the environment
source deactivate cedl
- [60%] Problem 1,2,3: Policy gradient
- [20%] Problem 5: Baseline bootstrapping
- [10%] Problem 6: Generalized Advantage Estimation
- for lazy person, you can refer to here
- [10%] Report
- [5%] Bonus, share you code and what you learn on github or yourpersonal blogs, such as this
- Deadline: 11/2 23:59, 2017
- Some of the codes are credited to Yen-Chen Lin 😄
- Office hour 2-3 pm in 資電館711 with Yuan-Hong Liao.
- Contact [email protected] for bugs report or any questions.