Releases: ntucllab/striatum
Releases · ntucllab/striatum
0.2.4
Change
- move reward from History to Recommendation (#121)
0.2.3
New
BaseBandit
support update_action
and remove_action
methods (#120)
Change
- recommendation class (#117)
Fix
- fix and unify the return value of
BaseBandit.get_action()
when the action storage is empty (#118)
0.2.2
Fix
- unify the data usage in all simulations, fixed #95 (#107)
- multi-action history, fixed #58 (#110)
- fix
rewardplot.calculate_cum_reward
(#112)
- remove
query_vector
in the Exp3 model and use the history instead, fixed #108 (#112)
Change
History
and HistoryStorage
interfaces are changed (#110)
0.2.1
Fix
- fix striatum.utils import error (#105)
0.2.0
Change
ActionStorage
(#99)
- reorder bandit
__init__
parameters and change some default values (#102)
- default
n_actions
in BaseBandit.get_action
is chaged to None
(#103)
BaseBandit.get_action
add support for n_actions=None
and n_actions=-1
(#103)
Fix
- remove the generators in Exp3 and UCB1 (#101)
0.1.1
Fix
- fix exp3 random_state; fixed #86 (#91)
- correct setup configuration; fixed #87 (#88)
0.1.0
New
- add reward calculation and plot methods in each bandit policy (#38)
- add trivis ci using tox (#44)
- Python3 support (#44)
- generating documentation using sphinx and host it on readthedoc (#44)
- BaseBandit.get_action_with_id (#63)
Change
- refactor bandit algorithms to allow multiple actions and rewards (#37)
- use Action object instead of action_id during policy initialization (#37)
- use "expert advice probability vectors" instead of "scikit-learn models" as input for Exp4p (#37)
- better simulation coding style (#47, #51, #53, #56, #72, #73, #74, #75, #76, #78)
- better coding style (#50, #62, #66, #79)
Fix
- fix the parameter updating bugs (query_vector calculation) in Exp4p (#37)
- fix bugs in Exp4.P (#66)
- remove generator in LinUCB (#54)
- remove generator in Exp4.P (#67)
- remove generator in LinTompSamp (#80)
0.0.1
New
- implement LinUCB ( #10 )
- implement UCB1 ( #10 )
- implement EXP3 ( #10 )
- implement EXP4P ( #10 )
- implement Thompson Sampling for Contextual Bandits with Linear Payoffs ( #10 )
- provide unit test for each bandit algorithm ( #10 )
- simulation on fake data ( #10 )
- benchmark using movielens ( #29 )