You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces DQNs for control from raw-pixel inputs, which uses Bellman updates to learn an action-value function and off-policy action selection using the learned action-value function. Demonstrates how experience replay helps the algorithm get an even distribution of experience (eg. avoid "feeback loops").
Learns to play a large number of Atari games. However, the algorithm is extremely sample inefficient.
Introduces prioritized experience replay, an improved version of the experience replay strategy used in the DQN paper. In prioritized experince replay, examples in the experience replay buffer are weighted by TD-error, which measures how surprising the transition was.