Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to create cublas handle: the resource allocation failed #4

Open
blinor opened this issue Jun 6, 2024 · 0 comments
Open

failed to create cublas handle: the resource allocation failed #4

blinor opened this issue Jun 6, 2024 · 0 comments

Comments

@blinor
Copy link

blinor commented Jun 6, 2024

Hey there,
I am trying to run a simple tensorflow training in a dockercontainer with fractional-gpu. No matter which one I use i always get:
`>>> model.fit(x_train, y_train, epochs=50, batch_size=1000)
Epoch 1/50
2024-06-06 10:53:20.251154: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:185] failed to create cublas handle: the resource allocation failed
2024-06-06 10:53:20.251203: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:188] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2024-06-06 10:53:20.251227: W external/local_xla/xla/stream_executor/stream.cc:1020] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node sequential/dense/MatMul defined at (most recent call last):
File "", line 1, in

File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1807, in fit

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1401, in train_function

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1384, in step_function

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1373, in run_step

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1150, in train_step

File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 590, in call

File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call

File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/sequential.py", line 398, in call

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 515, in call

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph

File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler

File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call

File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler

File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/dense.py", line 241, in call

Blas xGEMV launch failed : a.shape=[1,1000,784], b.shape=[1,784,1], m=1000, n=1, k=784
[[{{node sequential/dense/MatMul}}]] [Op:__inference_train_function_932] `
with the official tensorflow/tensorflow:latest-gpu image, everything works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant