Multi-Armed-Bandit-Problem

Python implementation and observation results for different algorithms in k - Armed Bandits Problem Implemented algorithm first is the epsilon-greedy algorithm, where the bandit with highest action value is selected (1-epsilon) times and a random bandit is selected epsilon times. Increasing the value of epsilon increases exploration, but decreases exploitation. The below image clearly shows the difference-

To implement this in Bandits.py, tweak with the value of the variable epsilon2

Initially, \epsilon = 0.01 performs worse as it hasn't explored every bandit yet. But over time, it performs better than \epsilon = 0.1, as it selects the best bandit only 91% of the time.

Another way to make the algorithm better is to implement Optimistic Initial Values Algorithm by taking- temp_Q2 = np.zeros(num_band) + 10 This makes sure exploration is done before exploitation. Now combining Optimistic Initial Values and \epsilon = 0.01, we get-

Clearly, now the \epsilon=0.01 performs better than \epsilon=0.1 from the start To implement this algorithm, add a number to the zero vector temp_Q. Tweak with the value of the number. The graph above is for temp_Q2 += 10

Another algorithm that can be used is exponential recency weighted average, which gives more preference to latest values, rather than older values. Non- Stationary problems work better with this algorithm, but worse than \epsilon - greedy on stationary problems. Here is an implementation of it on a stationary problem-

As this was a static problem, it performs far worse than our first algorithm

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
BanditsA2.png		BanditsA2.png
BanditsB2.png		BanditsB2.png
BanditsC2.png		BanditsC2.png
README.md		README.md
bandits.py		bandits.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Armed-Bandit-Problem

About

Releases

Packages

Languages

Sadiq0123/Multi-Armed-Bandit-Problem

Folders and files

Latest commit

History

Repository files navigation

Multi-Armed-Bandit-Problem

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages