Skip to content

Latest commit

 

History

History
55 lines (37 loc) · 4.85 KB

README.md

File metadata and controls

55 lines (37 loc) · 4.85 KB

Reinforcement Learning

This repo holds all programming assignments completed for my Reinforcement Learning course (Fall 2022).

Note: Scaffolding code was given for some of these assignments. All of my work is located inside block comments labeled ##### MY WORK START ##### and ##### MY WORK END #####.

Assignment Descriptions

Ex0 --- Exploration Policies

Introducing Reinforcement Learning and policies --- rewards and effects of random, expected-better and expected-worse policies.

Ex1 --- Exploration, Exploitation and Action Selection

Exploring the effects of exploration, exploitation and action selection within the k-arm bandit environment --- epsilon-greedy policies, Q-value initialization, UCB action selection.

Note: Ex2 was written only, so has been left out.

Ex3 --- Dynamic Programming + Policy Iteration

Implementing Dynamic Programming policy iteration in a grid world environment --- value iteration, transition probabilities, policy evaluation + improvement.

Ex4 --- Monte Carlo Control

Implementing Monte Carlo policy iteration in Blackjack, four-rooms, and racetrack environments --- first-visit MC, exploring starts, MC policy iteration.

Ex5 --- Q-Learning, SARSA, Expected SARSA and Bias/Variance in Temporal Differencing and Monte Carlo

Implementing Q-Learning, SARSA and expected SARSA policies in a windy grid world environment. Exploring the bias-variance trade-off between Temporal Differencing and Monte Carlo methods.

Ex6 --- Dyna-Q and Dyna-Q+

Implementing the Dyna-Q and Dyna-Q+ algorithms in an adaptive blocking maze environment.

Ex7 --- Semi-gradient SARSA, State Aggregation and Linear Function Approximation

Implementing semi-gradient SARSA learning with state aggregation techniques and linear function approximation methods.

Ex8 --- Deep Q-Learning Networks (DQNs)

Implementing DQNs using PyTorch for non-linear function approximation: epsilon schedules, replay buffers, optimization.