[bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47

ncble · 2019-05-24T13:57:40Z

Describe the bug
The training of SRL algorithm slows down by a few seconds per epoch. The first epoch cost 31s then, it grows almost linearly to 60s at the end of 30 epochs. Besides, I also took a look the memory usage, at the beginning it uses around 3GB of GPU memory, but it grows to 5GB at the end of first epoch. Is it normal ?

Code example
The following code can reproduce the problem.

Under robotics-rl-srl/
$ python -m environments.dataset_generator --env MobileRobotGymEnv-v0 --name mobile2D_fixed_tar_seed_0 --seed 0 --num-cpu 8
Under srl_zoo/
$ python train.py --data-folder mobile2D_fixed_tar_seed_0 --losses autoencoder

System Info

GPU: GTX2080Ti
Python 3.6.8
Tensorflow version = 1.13.1

araffin · 2019-05-24T14:01:35Z

also took a look the memory usage, at the beginning it uses around 3GB of GPU memory, but it grows to 5GB at the end of first epoch. Is it normal ?

looks like a memory leak...

araffin added bug Something isn't working question Further information is requested labels May 24, 2019

This was referenced Jun 11, 2019

A major (performance) update on the submodule: srl-zoo; fix several issues #41~43, #46~49. #44

Closed

A major (performance) update on the submodule: srl-zoo; fixes several issues #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47

[bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47

ncble commented May 24, 2019

araffin commented May 24, 2019

[bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47

[bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47

Comments

ncble commented May 24, 2019

araffin commented May 24, 2019