This repository aims to exhaustively implement various Deep Reinforcement Learning concepts covering most of the well-known resources from textbooks to lectures. For each notion, concise notes are provided to explain, and associated algorithms are implemented in addition to their environments and peripheral modules. At the end of this readme file, Reinforcement Learning's key papers and worthwhile resources are cited.
- Pseudocode and Algorithms
- Implementations and Environments
- Relevant Resourses
- Key Papers
- Contribution
-
Tabular Methods
- Bandit Problem
- Dynamic Programming
- Monte Carlo Methods
- Temporal-Difference Learning
- n-step Bootstrapping
- Planning and Learning
-
Approximate Solution Methods
- On-policy Prediction With Approxiamtion
- Gradient Monte Carlo
- Semi-Gradient TD(0)
- On-policy Control With Approxiamtion
- Semi-Grdient SARSA
- Semi-Gradient n-step SARSA
- Off-policy Control With Approxiamtion
- Eligibility Traces
- Policy Gradient Methods
- REINFORCE
- one-step Actor-Critic
- On-policy Prediction With Approxiamtion
-
Deep Reinforcement Learning Methods
- Value-Based Methods
- Neural Fitted Q-function (NFQ)
- DQN
- DDQN
- Dueling DDQN
- PER
- C51
- QR-DQN
- HER
- Policy-Based Methods
- REINFORCE
- VPG
- PPO
- TRPO
- Stochastic Actor-Critic Methods
- A2C
- A3C
- GAE
- ACKTR
- Deterministic Actor-Critic Methods
- Deep Deterministic Policy Gradient (DDPG)
- TD3
- SAC
- Value-Based Methods
-
Black Jack
- Monte Carlo Prediction
- Monte Carlo Exploring Starts
-
CartPole
- Fully Connected Q-function
- DQN
- DDQN
- Dueling DQN
-
Cliff Walking
- SARSA
- Q-Learning
- Expected SARSA
-
Gambler's Problem
- Value Iteration
-
Grid World
- Iterative Policy Evaluation
-
Jack's Car Rental
- Policy Iteration
-
Lunar Lander
- REINFORCE using Non-linear Approximation
- VPG
-
Small MDP (Maximization Bias)
- Q-Learning
- Double Q-Learning
-
Mountain Climbing
- Semi-Gradient SARSA
- Semi-Gradient n-step SARSA
-
Multi-Armed Bandit
- Simple Bandit
- Gradient Bandit
-
Pendulum Swing-Up
- Actor-Critic using Tile-coding
- Actor-Critic Countinous Action Space
-
Random Walk
- n-step TD Prediction
- Gradient Monte Carlo State Aggregation
- Gradient Monte Carlo Tile Coding
- Semi-Gradient TD(0) State Aggregation
-
Short Corridor Gridworld
- REINFORCE (Policy Gradient) using Linear Approximation
- REINFORCE with Baseline
-
Windy Grid World
- SARSA
- Reinforcement Learning An Introduction. Richard S. Sutton and Andrew G. Barto
- Algorithms for Reinforcement Learning. Csaba Szepesvari
- Foundations of Deep Reinforcement Learning: Theory and Practice in Python. Laura Graesser and Wah Loon Keng
- Grokking Deep Reinforcement Learning. Miguel Morales
- Deep Reinforcement Learning Hands-On: Apply modern RL methods to practical problems of chatbots, robotics, discrete optimization, web automation, and more, 2nd Edition. Maxim Lapan
- Deep Reinforcement Learning with Python, Second Edition. Sudharsan Ravichandiran
- Deep Reinforcement Learning in Action, Brandon Brown and Alexander Zai
- Deep Reinforcement Learning Fundamentals, Research and Applications. Hao Dong, Zihan Ding, and Shanghang Zhang
-
Artificial Inteligence
-
Reinforcement Learning
-
Deep Reinforcement Learning
- Actor-Critic -
- REINFORCE - Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992 83 8, 229–256 (1992).
- Deep Reinforcement Learning
-
Value-based Methods
- NFQ - Riedmiller, M. Neural fitted Q iteration - First experiences with a data efficient neural Reinforcement Learning method. in Lecture Notes in Computer Science vol. 3720 LNAI 317–328 (Springer, Berlin, Heidelberg, 2005).
- DQN - Mnih, V. et al. Playing Atari with Deep Reinforcement Learning. (2013).
- DDQN - van Hasselt, H., Guez, A. & Silver, D. Deep Reinforcement Learning with Double Q-learning. 30th AAAI Conf. Artif. Intell. AAAI 2016 2094–2100 (2015).
- Dueling DQN - Wang, Z. et al. Dueling Network Architectures for Deep Reinforcement Learning. 33rd Int. Conf. Mach. Learn. ICML 2016 4, 2939–2947 (2015).
- PER - Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Prioritized Experience Replay. 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc. (2015).
- Rainbow - Hessel, M. et al. Rainbow: Combining Improvements in Deep Reinforcement Learning. 32nd AAAI Conf. Artif. Intell. AAAI 2018 3215–3222 (2017).
-
Policy-based Methods
-
Actor-Critic Methods
- AC
- A3C/A2C - Mnih, V. et al. Asynchronous Methods for Deep Reinforcement Learning. 33rd Int. Conf. Mach. Learn. ICML 2016 4, 2850–2869 (2016).
- GAE - Schulman, J., Moritz, P., Levine, S., Jordan, M. I. & Abbeel, P. High-dimensional continuous control using generalized advantage estimation. in 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings (International Conference on Learning Representations, ICLR, 2016).
- PPO
- TRPO
-
Deterministic Actor-Critic Methods
- DPG - Silver, D. et al. Deterministic Policy Gradient Algorithms. 387–395 (2014).
- DDPG - Lillicrap, T. P. et al. Continuous control with deep reinforcement learning. 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc. (2015).
- TD3 - Fujimoto, S., van Hoof, H. & Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. 35th Int. Conf. Mach. Learn. ICML 2018 4, 2587–2601 (2018).
- SAC - Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. 35th Int. Conf. Mach. Learn. ICML 2018 5, 2976–2989 (2018).
-
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the me
before making a change.