Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I know the GPU is being used when I run the Deep Learning ... notebook? #18

Open
nateGeorge opened this issue Sep 21, 2016 · 3 comments

Comments

@nateGeorge
Copy link

nateGeorge commented Sep 21, 2016

I'm trying to run the Deep Learning demo notebook, and it's taking a really long time on the training. It also doesn't look like it's using the GPU. I'm on an Amazon EC2 g2.2xlarge with the NVIDIA Corporation GK104GL [GRID K520](rev a1). I tried some of the solutions here: karpathy/char-rnn#89, like

require 'cunn'
require 'cutorch'

and th -l cutorch and th -l cunn from the command line. However, when I run the line

trainer:train(trainset)

it just seems to sit there in progress and doesn't go anywhere. I also checked the GPU usage with nvidia-smi, and it looks like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.77                 Driver Version: 361.77                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   31C    P8    26W / 125W |    121MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7379    C   /home/ubuntu/torch/install/bin/luajit          119MiB |
+-----------------------------------------------------------------------------+

It jumps up in memory usage and starts the PID after require cutorch, and the memory usage never increases after that. GPU-Util sits at 0%. I have CUDA installed; nvcc --version gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

It's running on Ubuntu 16.04. I verified the samples are working, and CUDA isn't giving any errors.
Any ideas why it wouldn't be using the GPU?

@pankajkumar
Copy link

+1

@mhmtsarigul
Copy link

Are you sure you convert your network and criterion into cuda.

@TheRum
Copy link

TheRum commented Apr 21, 2018

nvidia-smi returns the stats when fired. Put it in loop or 'watch -n nvidia-smi', if not already tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants