Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47

Open
ncble opened this issue May 24, 2019 · 1 comment
Labels
bug Something isn't working question Further information is requested

Comments

@ncble
Copy link
Collaborator

ncble commented May 24, 2019

Describe the bug
The training of SRL algorithm slows down by a few seconds per epoch. The first epoch cost 31s then, it grows almost linearly to 60s at the end of 30 epochs. Besides, I also took a look the memory usage, at the beginning it uses around 3GB of GPU memory, but it grows to 5GB at the end of first epoch. Is it normal ?

Code example
The following code can reproduce the problem.

  • Under robotics-rl-srl/
    $ python -m environments.dataset_generator --env MobileRobotGymEnv-v0 --name mobile2D_fixed_tar_seed_0 --seed 0 --num-cpu 8

  • Under srl_zoo/
    $ python train.py --data-folder mobile2D_fixed_tar_seed_0 --losses autoencoder

System Info

  • GPU: GTX2080Ti
  • Python 3.6.8
  • Tensorflow version = 1.13.1
@araffin
Copy link
Owner

araffin commented May 24, 2019

also took a look the memory usage, at the beginning it uses around 3GB of GPU memory, but it grows to 5GB at the end of first epoch. Is it normal ?

looks like a memory leak...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants