Reinforcement Learning 2nd Edition - Notes and Codes

Reinforcement Learning - An Introduction, 2nd Edition, written by Richard S. Sutton and Andrew G. Barto, is kind of bible of reinforcement learning. It is a required reading for students and researchers to get the appropriate context of the keep developing field of RL and AI.

Links to get or rent a hardcover or ebook: MIT Press, Amazon (Paperback version if generally not recommended because the poor printing quality).

Motivation of this project:

Although the authors have made the book extremely clear and friendly to readers at each level, this book is honestly still intimidating to RL or ML beginners because of the intense concepts, abstract examples and algorithms, and its volume. Therefore, as an RL researcher, I'm trying to extract key points and implement examples as well as exercises in the book to help more people better understand the valuable knowledge the book generously provides.

My work mainly consists of:

Turning examples into code and plots that are as close to that of in the book as possible;
Implementing algorithms in Python and testing them with RL playground packages like Gymnasium;
Take notes and organize them as PDF files per chapter.

Snapshot of chapters:

Chapter 2: Multi-armed Bandits 🔗 link

This chapter starts with bandit algorithm and introduces strategies like $\varepsilon$-greedy, Upper-Confidence-Bound, and Gradient Bandit to improve the the algorithm's performance.

A k-armed bandit testbed:

Parameter study (algorithm comparison) - stationary environment

Chapter 3: Finite Markov Decision Process 🔗 link

This chapter introduces the fundamentals of the Markov Decision Process in finite states like agent-environment interaction, goals and rewards, returns and episodes, and policy and value function. It helps to build up a basic understanding of the components of reinforcement learning.

Optimal solution to the gridworld example:

Chapter 4: Dynamic Programming 🔗 link

The dynamic programming (DP) methods introduced in this chapter includes policy iteration, which consists policy evaluation and policy improvement, and value iteration, which considered a concise and efficient version of policy iteration. The chapter puts up a topic that the evaluation and improvement process compete with each other but also cooperate to find the optimal value function and an optimal policy.

Jack's Car Rental example

Gambler's problem

Chapter 5: Monte Carlo Methods 🔗 link

Monte Carlo methods can be used to learn optimal behavior directly from interaction with the environment, with no model of the environment's dynamics. The chapter introduces on-policy MC methods like first-visit Monte Carlo prediction with/without Exploring Starts, and off-policy MC methods like ordinary/weighted importance sampling.

The infinite variance of ordinary importance sampling

Racetrack

Chapter 6: Temporal-Difference Learning 🔗 link

This chapter introduced temporal-difference (TD) learning, and showed how it can be applied to the reinforcement learning problem. The TD control methods are classified according to whether they deal with the complication by using and on-policy (SARSA, expected SARSA) or off-policy (Q-learning) approach. The chapter also discussed using double learning method to avoid maximization bias problem.

Comparison of TD(0) and MC on Random Walk environment

Interim and Asymptotic Performance of TD methods

Chapter 7: n-step Bootstrapping 🔗link

In progress

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
chapter_02_k_armed_bandits		chapter_02_k_armed_bandits
chapter_03_finite_MDP		chapter_03_finite_MDP
chapter_04_dynamic_programming		chapter_04_dynamic_programming
chapter_05_monte_carlo_methods		chapter_05_monte_carlo_methods
chapter_06_temporal_difference_learning		chapter_06_temporal_difference_learning
chapter_07_n_step_bootstrapping		chapter_07_n_step_bootstrapping
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning 2nd Edition - Notes and Codes

Motivation of this project:

Snapshot of chapters:

Chapter 2: Multi-armed Bandits 🔗 link

Chapter 3: Finite Markov Decision Process 🔗 link

Chapter 4: Dynamic Programming 🔗 link

Chapter 5: Monte Carlo Methods 🔗 link

Chapter 6: Temporal-Difference Learning 🔗 link

Chapter 7: n-step Bootstrapping 🔗link

About

Releases

Packages

Contributors 2

Languages

License

terrence-ou/Reinforcement-Learning-2nd-Edition-Notes-Codes

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning 2nd Edition - Notes and Codes

Motivation of this project:

Snapshot of chapters:

Chapter 2: Multi-armed Bandits 🔗 link

Chapter 3: Finite Markov Decision Process 🔗 link

Chapter 4: Dynamic Programming 🔗 link

Chapter 5: Monte Carlo Methods 🔗 link

Chapter 6: Temporal-Difference Learning 🔗 link

Chapter 7: n-step Bootstrapping 🔗link

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages