-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training too slow and not using full GPU, whats the training time ? #2
Comments
From your screenshot, I would say that you didnt use gpu. Would you try to print some message after the line "if torch.cuda.is_available():" ? |
I do was using GPU, the 2nd image is showing that as the python process was running on GPU, its just that it was using only 300-400MB GPU instead of full 12GB, thus the issue. I had now deleted my project since there was no reply I thought this is a dead repo. |
Sorry for that. I was quite busy over the last months. |
Btw, if you want to train a model with 300 categories, increasing model's breadth and deepness is indispensable. |
New to all these terms, can you provide me the parameters to use for whole data set ? Also, the way I wanted to try it was for saved image files, like I can just load a saved .png .jpg file of a drawing, feed it to the network and get the result. Somehow, all the repos I have come across either use webcam approach, javascript or recording/storing drawing coordinates and none had a approach to use saved image files (whether hand drawings or even quick draw dataset as saved image files). I tried to pre-process the images to feed in network but the result is not the same. |
Using default parameters and all 300 categories, I feel its training quite slow even though I am using a AWS EC2 P2.xLarge instance with Nvidia K80 GPU.
Its using only 360MB of the GPU and I feel as if its stuck on that usage, its not using more or less from that number (checked via nvidia-smi command)
I tried calculating the time between each iteration and its 5-7 seconds, and calculating the total iterations with that time and 20 epochs, its results in more than 150 days.
The text was updated successfully, but these errors were encountered: