Skip to content

Latest commit

 

History

History
75 lines (52 loc) · 4.86 KB

File metadata and controls

75 lines (52 loc) · 4.86 KB

Curiosity-driven Exploration in Drone Navigation

Requirements

Environments

This repository includes the following environments, each of them is composed of Sparse and Dense reward modes.

First-person Third-person Top-down

Contents

  • Advantage Actor Critic (A2C)
  • Proximal Policy Optimization (PPO)
  • Intrinsic Curiosity Module (ICM)
  • Random Network Distillation (RND)
  • Universal Value Function Approximators (UVFA)
  • Never Give Up (NGU)

Experiments

CartPole-v1

The CartPole Environments has been modified, and its time-based reward is supplanted by a sparse reward system that only returns the last reward of each episode.

Montezuma Revenge

The goal is to acquire Montezuma’s treasure by making a way through a maze of chambers within the emperor’s fortress. Player must avoid deadly creatures while collecting valuables and tools which can help them escape with the treasure.

VizDoom

VizDoom is a Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information.

Basic

This map is a rectangle with walls, ceiling and floor. Player is spawned along the longer wall, in the center. A circular monster is spawned randomly somewhere along the opposite wall. Player can only go left/right and shoot. One hit is enough to kill the monster and the episode finishes when the monster is killed or on timeout.

basic-vd.mp4

Defend the Center

This map is a large circular environment. Player is spawned in the exact center. 5 melee-only, monsters are spawned along the wall. Monsters are killed after a single shot. After dying each monster is respawned after some time. Episode ends when the player dies.

Deadly Corridor

This map is a corridor with shooting monsters on both sides (6 monsters in total). A green vest is placed at the oposite end of the corridor. Reward is proportional (negative or positive) to change of the distance between the player and the vest. If player ignores monsters on the sides and runs straight for the vest he will be killed somewhere along the way.

UE4 Airsim Maze

The goal is to explore through a labyrinth and find the terminal square. Along the way, the agent should avoid colliding with walls; otherwise, the environment will reset the episode, and a -1 reward will be given.

thirdp.mp4
firstp.mp4

drone-len

drone-rew

drone-ppo

drone-icm

References

  1. Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven Exploration by Self-supervised Prediction. 34th Int. Conf. Mach. Learn. ICML 2017 6, 4261–4270 (2017).
  2. Burda, Y., Edwards, H., Storkey, A. & Klimov, O. Exploration by Random Network Distillation. 7th Int. Conf. Learn. Represent. ICLR 2019 1–17 (2018).
  3. Burda, Y., Storkey, A., Darrell, T. & Efros, A. A. Large-scale study of curiosity-driven learning. 7th Int. Conf. Learn. Represent. ICLR 2019 (2019).
  4. Badia, A. P. et al. Never Give Up: Learning Directed Exploration Strategies. 1–28 (2020).