GPU inference not working #949

charrezde · 2024-12-04T13:40:26Z

Summary

The compilation is happening with CUDA but the FHE inference is not possible on CUDA. It by default convert the input to numpy, which is only working with cpu.
Will there be a support for using GPUs for inference?

Description

versions affected: 1.7.0
python version: 3.10
config (optional: HW, OS): Ubuntu 24.04.1 LTS
workaround (optional): transfer the input to the cpu
proposed fix (optional):

Step by step procedure someone should follow to trigger the bug:

minimal POC to trigger the bug

print("Minimal POC to reproduce the bug")

# compile on gpu
images= images.cpu()
model = model.cpu()
compilation_device="cuda"
n_bits=6
rounding_threshold_bits=6
compile_config = {
    "n_bits": n_bits,
    "rounding_threshold_bits": (
        {"n_bits": rounding_threshold_bits, "method": "APPROXIMATE"}
        if rounding_threshold_bits is not None
        else None
    ),
}
config = Configuration(enable_tlu_fusing=True, print_tlu_fusing=False)
compile_config.update(
    {
        "p_error": 0.05,
        "configuration": config,
    }
)
q_module = compile_torch_model(model, torch_inputset=images, **compile_config, device=compilation_device)

# inference on gpu
images= images.to("cuda")
q_module.forward(images, fhe="disable") # Error here

# fix: inference on cpu
images= images.cpu()
q_module.forward(images, fhe="disable")

File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/myuser/homomorphic-encryption/demo/resnet_fhe.py", line 373, in <module>
    main()
  File "/data/myuser/homomorphic-encryption/demo/resnet_fhe.py", line 352, in main
    evaluate_model_cml(q_module, mini_test_dataloader, fhe="disable", device=compilation_device)
  File "/data/myuser/homomorphic-encryption/demo/resnet_fhe.py", line 58, in evaluate_model_cml
    batch_outputs = q_module.forward(data, fhe=fhe)
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 465, in forward
    q_x = to_tuple(self.quantize_input(*x))
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 730, in quantize_input
    q_x = tuple(
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 732, in <genexpr>
    self.input_quantizers[idx].quant(x[idx])  # type: ignore[arg-type]
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py", line 758, in quant
    qvalues = numpy.rint(values / self.scale + self.zero_point)
  File "/data/myuser/homomorphic-encryption/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 1087, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

The text was updated successfully, but these errors were encountered:

jfrery · 2024-12-04T14:52:54Z

Hi @charrezde,

The input you send in the q_module.forward should be numpy and not torch. You can fix this with:

q_module.forward(images.detach().cpu().numpy(), fhe="disable")

However, GPU acceleration is for the FHE execution so you need to do fhe=execute to actually see the GPU execution.

charrezde · 2024-12-04T16:08:31Z

Hi @jfrery , this does not seem to be working for me

q_module.forward(images.to("cuda"), fhe="execute")

This is failing due to:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

It is by default converting the inputs to numpy.

If I run:

q_module.forward(images.detach().numpy(), fhe="execute")

I get my process killed.

jfrery · 2024-12-04T16:15:46Z

this does not seem to be working for me

q_module.forward(images.to("cuda"), fhe="execute")
This is failing due to:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
It is by default converting the inputs to numpy.

Yes that is expected. We don't handle torch input for the inference part. Only for the compilation so you need to convert them to numpy.

If I run:

q_module.forward(images.detach().numpy(), fhe="execute")
I get my process killed.

Alright I would need more information here:

What kind of message do you get?
What's your GPU ?
What model are you trying to compile?
and what's the input images size?
It's likely that the GPU memory isn't enough to run the circuit. Encrypted data are much bigger than those in plaintext.

charrezde · 2024-12-05T07:22:08Z

message I get: Killed
NVIDIA Tesla T4, 16384MiB
I am currently testing with resnet18 but I will use bigger models and yolo detectors
image_size (224, 224,3)

jfrery · 2024-12-06T15:08:55Z

How many images are you trying to pass in the compile? You should probably monitor the gpu memory as I think this is the bottleneck. And maybe try with a single image in the compile and a single image in the test. Hopefully that fits within the 16GB.

We run resnet like model on h100. I am not sure you can do that on a T4.

charrezde added the bug Something isn't working label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU inference not working #949

GPU inference not working #949

charrezde commented Dec 4, 2024

jfrery commented Dec 4, 2024

charrezde commented Dec 4, 2024

jfrery commented Dec 4, 2024

charrezde commented Dec 5, 2024

jfrery commented Dec 6, 2024

GPU inference not working #949

GPU inference not working #949

Comments

charrezde commented Dec 4, 2024

Summary

Description

jfrery commented Dec 4, 2024

charrezde commented Dec 4, 2024

jfrery commented Dec 4, 2024

charrezde commented Dec 5, 2024

jfrery commented Dec 6, 2024