Skip to content

Exhaustive Implementation of Algorithms, Key Papers, and Well-Known Problems of Reinforcement Leaning

License

Notifications You must be signed in to change notification settings

TroddenSpade/Exhaustive-Reinforcement-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exhaustive Reinforcement Learning

This repository aims to exhaustively implement various Deep Reinforcement Learning concepts covering most of the well-known resources from textbooks to lectures. For each notion, concise notes are provided to explain, and associated algorithms are implemented in addition to their environments and peripheral modules. At the end of this readme file, Reinforcement Learning's key papers and worthwhile resources are cited.

Motivations

Table of Contents

Pseudocode and Algorithms

Textbooks Taxonomy

  • Tabular Methods

    • Bandit Problem
    • Dynamic Programming
    • Monte Carlo Methods
    • Temporal-Difference Learning
    • n-step Bootstrapping
    • Planning and Learning
  • Approximate Solution Methods

    • On-policy Prediction With Approxiamtion
      • Gradient Monte Carlo
      • Semi-Gradient TD(0)
    • On-policy Control With Approxiamtion
      • Semi-Grdient SARSA
      • Semi-Gradient n-step SARSA
    • Off-policy Control With Approxiamtion
    • Eligibility Traces
    • Policy Gradient Methods
      • REINFORCE
      • one-step Actor-Critic
  • Deep Reinforcement Learning Methods

    • Value-Based Methods
      • Neural Fitted Q-function (NFQ)
      • DQN
      • DDQN
      • Dueling DDQN
      • PER
      • C51
      • QR-DQN
      • HER
    • Policy-Based Methods
      • REINFORCE
      • VPG
      • PPO
      • TRPO
    • Stochastic Actor-Critic Methods
      • A2C
      • A3C
      • GAE
      • ACKTR
    • Deterministic Actor-Critic Methods
      • Deep Deterministic Policy Gradient (DDPG)
      • TD3
      • SAC

Environments

  • Black Jack

    • Monte Carlo Prediction
    • Monte Carlo Exploring Starts
  • CartPole

    • Fully Connected Q-function
    • DQN
    • DDQN
    • Dueling DQN
  • Cliff Walking

    • SARSA
    • Q-Learning
    • Expected SARSA
  • Gambler's Problem

    • Value Iteration
  • Grid World

    • Iterative Policy Evaluation
  • Jack's Car Rental

    • Policy Iteration
  • Lunar Lander

    • REINFORCE using Non-linear Approximation
    • VPG
  • Small MDP (Maximization Bias)

    • Q-Learning
    • Double Q-Learning
  • Mountain Climbing

    • Semi-Gradient SARSA
    • Semi-Gradient n-step SARSA
  • Multi-Armed Bandit

    • Simple Bandit
    • Gradient Bandit
  • Pendulum Swing-Up

    • Actor-Critic using Tile-coding
    • Actor-Critic Countinous Action Space
  • Random Walk

    • n-step TD Prediction
    • Gradient Monte Carlo State Aggregation
    • Gradient Monte Carlo Tile Coding
    • Semi-Gradient TD(0) State Aggregation
  • Short Corridor Gridworld

    • REINFORCE (Policy Gradient) using Linear Approximation
    • REINFORCE with Baseline
  • Windy Grid World

    • SARSA

Relevant Resourses

Textbooks

Courses

Useful Blogs

Articles

Key Papers

Contribution

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the me before making a change.