Skip to content

An "over-optimistic" effort to read and summarize a Deep Reinforcement Learning based paper a day 🀩 πŸ‘Š

Notifications You must be signed in to change notification settings

ashutoshtiwari13/A-RL-Paper-A-Day-Keeps-boredom-away

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Summaries of Key Papers in Deep RL

Note: All summaries/insights(found in the Python notebooks) are written keeping in mind the reader is conversant with the basics of RL and standard RL literature. :bowtie:

  1. Model-Free RL

  2. Exploration

Model-Free RL

Deep Q-Learning

  • Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013. Algorithm: DQN. [paper][Summary]

  • Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm: Deep Recurrent Q-Learning.[paper][Summary]

  • Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015. Algorithm: Dueling DQN.[paper][Summary]

  • Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015. Algorithm: Double DQN. [paper] [Summary]

  • Prioritized Experience Replay, Schaul et al, 2015. Algorithm: Prioritized Experience Replay (PER). [paper] [Summary]

  • Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al, 2017. Algorithm: Rainbow DQN. [paper][Summary]

Policy Gradient

  • Asynchronous Methods for Deep Reinforcement Learning, Mnih et al, 2016. Algorithm: A3C.[paper][Summary]

  • Trust Region Policy Optimization, Schulman et al, 2015. Algorithm: TRPO. [paper][Summary]

  • High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm: GAE. [paper][Summary]

Distributional RL

  • A Distributional Perspective on Reinforcement Learning, Bellemare et al, 2017. Algorithm: C51. [paper][Summary]

  • Distributional Reinforcement Learning with Quantile Regression, Dabney et al, 2017. Algorithm: QR-DQN.[paper][Summary]

Policy Gradients with Action-Dependent Baselines

  • Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, Gu et al, 2016. Algorithm: Q-Prop.[paper][Summary]

Exploration

Intrinsic Motivation

  • VIME: Variational Information Maximizing Exploration, Houthooft et al, 2016. Algorithm: VIME. [paper][Summary]

  • Unifying Count-Based Exploration and Intrinsic Motivation, Bellemare et al, 2016. Algorithm: CTS-based Pseudocounts. [paper][Summary]

Unsupervised RL

  • Variational Intrinsic Control, Gregor et al, 2016. Algorithm: VIC.[paper][Summary]

Last Updated : 20/9/2020 βœ”οΈ