[e2e][ONNX][Model] onnx.Matmul failure to lower #666

zjgarvey · 2024-05-03T15:26:04Z

The IR generated in RAFT_vaiq_int8 model e2eshark test shown here:

    %863 = torch.aten._make_per_tensor_quantized_tensor %862, %float3.125000e-02, %int0 : !torch.vtensor<[2,128,32,32],si8>, !torch.float, !torch.int -> !torch.vtensor<[2,128,32,32],!torch.qint8>
    %864 = torch.aten.dequantize.self %863 : !torch.vtensor<[2,128,32,32],!torch.qint8> -> !torch.vtensor<[2,128,32,32],f32>
//    %865 = torch.aten.slice.Tensor %864, %int0, %int0, %int1, %int1 : !torch.vtensor<[2,128,32,32],f32>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1,128,32,32],f32>
    %866 = torch.aten.slice.Tensor %864, %int0, %int1, %int2, %int1 : !torch.vtensor<[2,128,32,32],f32>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[1,128,32,32],f32>
//    %867 = torch.prim.ListConstruct %int1, %int128, %int1024 : (!torch.int, !torch.int, !torch.int) -> !torch.list<int>
//    %868 = torch.aten.reshape %865, %867 : !torch.vtensor<[1,128,32,32],f32>, !torch.list<int> -> !torch.vtensor<[1,128,1024],f32>
    %869 = torch.aten.reshape %866, %867 : !torch.vtensor<[1,128,32,32],f32>, !torch.list<int> -> !torch.vtensor<[1,128,1024],f32>
//    %870 = torch.aten.transpose.int %868, %int1, %int2 : !torch.vtensor<[1,128,1024],f32>, !torch.int, !torch.int -> !torch.vtensor<[1,1024,128],f32>
    %871 = torch.aten.quantize_per_tensor %870, %float3.125000e-02, %int0, %int12 : !torch.vtensor<[1,1024,128],f32>, !torch.float, !torch.int, !torch.int -> !torch.vtensor<[1,1024,128],!torch.qint8>
    %872 = torch.aten.int_repr %871 : !torch.vtensor<[1,1024,128],!torch.qint8> -> !torch.vtensor<[1,1024,128],si8>
    %873 = torch.aten._make_per_tensor_quantized_tensor %872, %float3.125000e-02, %int0 : !torch.vtensor<[1,1024,128],si8>, !torch.float, !torch.int -> !torch.vtensor<[1,1024,128],!torch.qint8>
    %874 = torch.aten.dequantize.self %873 : !torch.vtensor<[1,1024,128],!torch.qint8> -> !torch.vtensor<[1,1024,128],f32>
    %875 = torch.aten.matmul %874, %869 : !torch.vtensor<[1,1024,128],f32>, !torch.vtensor<[1,128,1024],f32> -> !torch.vtensor<[1,1024,1024],f32>

is failing to lower due to half-fusion with quantization:

Raft.torch.mlir:910:12: error: failed to legalize operation 'torch.aten.matmul' that was explicitly marked illegal
    %875 = torch.aten.matmul %874, %869 : !torch.vtensor<[1,1024,128],f32>, !torch.vtensor<[1,128,1024],f32> -> !torch.vtensor<[1,1024,1024],f32>
           ^
Raft.torch.mlir:910:12: note: see current operation: %6659 = "torch.aten.matmul"(%6658, %6635) : (!torch.vtensor<[1,1024,128],!torch.qint8>, !torch.vtensor<[1,128,1024],f32>) -> !torch.vtensor<[1,1024,1024],f32>

The text was updated successfully, but these errors were encountered:

zjgarvey self-assigned this May 3, 2024

zjgarvey mentioned this issue May 10, 2024

Generalize Operand Quantization in FuseQuantizeOps llvm/torch-mlir#3327

Merged

zjgarvey closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[e2e][ONNX][Model] onnx.Matmul failure to lower #666

[e2e][ONNX][Model] onnx.Matmul failure to lower #666

zjgarvey commented May 3, 2024

[e2e][ONNX][Model] onnx.Matmul failure to lower #666

[e2e][ONNX][Model] onnx.Matmul failure to lower #666

Comments

zjgarvey commented May 3, 2024