Homework3-Policy Gradient

In this homework, you will use a neural network to learn a parameterize policy that can select action without consulting a value function. A value function may still be used to learn the policy weights, but is not required for action selection.

There are some advantage of the policy-based algorithms:

Policy-based methods also offer useful ways of dealing with continuous action spaces
For some tasks, the policy function is simpler and thus easier to approximate.

Introduction

We will use CartPole-v0 as environment in this homework. The following gif is the visualization of the CartPole:

For further description, please see here

Setup

Python 3.5.3
OpenAI gym
tensorflow
numpy
matplotlib
ipython

We encourage you to install Anaconda or Miniconda in your laptop to avoid tedious dependencies problem.

for lazy people:

conda env create -f environment.yml
source activate cedl
# deactivate when you want to leave the environment
source deactivate cedl

TODO

[60%] Problem 1,2,3: Policy gradient
[20%] Problem 5: Baseline bootstrapping
[10%] Problem 6: Generalized Advantage Estimation
- for lazy person, you can refer to here
[10%] Report
[5%] Bonus, share you code and what you learn on github or yourpersonal blogs, such as this

Other

Deadline: 11/2 23:59, 2017
Some of the codes are credited to Yen-Chen Lin 😄
Office hour 2-3 pm in 資電館711 with Yuan-Hong Liao.
Contact [email protected] for bugs report or any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
image		image
policy_gradient		policy_gradient
.gitignore		.gitignore
Lab3-policy-gradient.ipynb		Lab3-policy-gradient.ipynb
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
report.md		report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homework3-Policy Gradient

Introduction

Setup

TODO

Other

About

Releases

Packages

Languages

smileyung/homework3-policy-gradient

Folders and files

Latest commit

History

Repository files navigation

Homework3-Policy Gradient

Introduction

Setup

TODO

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages