You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey there,
I am trying to run a simple tensorflow training in a dockercontainer with fractional-gpu. No matter which one I use i always get:
`>>> model.fit(x_train, y_train, epochs=50, batch_size=1000)
Epoch 1/50
2024-06-06 10:53:20.251154: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:185] failed to create cublas handle: the resource allocation failed
2024-06-06 10:53:20.251203: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:188] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2024-06-06 10:53:20.251227: W external/local_xla/xla/stream_executor/stream.cc:1020] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node sequential/dense/MatMul defined at (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1807, in fit
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1401, in train_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1384, in step_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1373, in run_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1150, in train_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 590, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/sequential.py", line 398, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 515, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/dense.py", line 241, in call
Blas xGEMV launch failed : a.shape=[1,1000,784], b.shape=[1,784,1], m=1000, n=1, k=784
[[{{node sequential/dense/MatMul}}]] [Op:__inference_train_function_932] `
with the official tensorflow/tensorflow:latest-gpu image, everything works as expected.
The text was updated successfully, but these errors were encountered:
Hey there,
I am trying to run a simple tensorflow training in a dockercontainer with fractional-gpu. No matter which one I use i always get:
`>>> model.fit(x_train, y_train, epochs=50, batch_size=1000)
Epoch 1/50
2024-06-06 10:53:20.251154: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:185] failed to create cublas handle: the resource allocation failed
2024-06-06 10:53:20.251203: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:188] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2024-06-06 10:53:20.251227: W external/local_xla/xla/stream_executor/stream.cc:1020] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node sequential/dense/MatMul defined at (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1807, in fit
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1401, in train_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1384, in step_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1373, in run_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1150, in train_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 590, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/sequential.py", line 398, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 515, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/dense.py", line 241, in call
Blas xGEMV launch failed : a.shape=[1,1000,784], b.shape=[1,784,1], m=1000, n=1, k=784
[[{{node sequential/dense/MatMul}}]] [Op:__inference_train_function_932] `
with the official tensorflow/tensorflow:latest-gpu image, everything works as expected.
The text was updated successfully, but these errors were encountered: