Generalize Operand Quantization in FuseQuantizeOps #3327

zjgarvey · 2024-05-10T21:48:29Z

This change enables more customization with operand quantization, and generalizes the patterns QuantizeOperands and QuantizeTransposeOperands to QuantizeOperandsPastCommutingOps.

This allows for passing quantization through operations which are functionally unaffected by quantization, such as view-like ops. The purpose of this change is to address a myriad of quantization issues seen in quantized onnx models that have some reshape-like operations sandwiched in between a dequant and something like a matmul (whose other operand is immediately quantizable).

If other model failures of this type occur in the future, other authors can simply add the problematic ops to the qCommutingOps function defined here to enable the quantization to pass through.

For each op, I have set the depth of the search to the largest depth I have seen in practice:

For aten mm op, see this issue
For aten matmul op, see this issue

If longer patterns exist, the depth can easily be increased.

zjgarvey · 2024-05-10T22:36:31Z

Ah, I see the issue. Debugging currently.

This change enables more customization with operand quantization, and generalizes the patterns QuantizeOperands and QuantizeTransposeOperands to QuantizeOperandsPastCommutingOps. This allows for passing quantization through operations which are functionally unaffected by quantization, such as view-like ops. The purpose of this change is to address a myriad of quantization issues seen in quantized onnx models that have some reshape-like operations sandwiched in between a dequant and something like a matmul (whose other operand is immediately quantizable).

zjgarvey added 3 commits May 10, 2024 21:33

Generalizes QuantizeTransposedOperands to other commuting ops

8e3d0ba

try to fix unknown lit test failure (not reproduced locally)

673ffcf

Merge remote-tracking branch 'upstream/main' into qfusionupdate2

03719af

fix crashing due to block args not having a defining op

db5eebd

zjgarvey force-pushed the qfusionupdate2 branch from 684f924 to db5eebd Compare May 10, 2024 22:49

zjgarvey requested a review from rsuderman May 10, 2024 23:00

rsuderman approved these changes May 13, 2024

View reviewed changes

rsuderman merged commit 75d1d72 into llvm:main May 13, 2024
3 checks passed

zjgarvey mentioned this pull request May 16, 2024

[ONNX][TorchToLinalg] Add support for dynamic dims in Interpolate lowering #3351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize Operand Quantization in FuseQuantizeOps #3327

Generalize Operand Quantization in FuseQuantizeOps #3327

zjgarvey commented May 10, 2024

zjgarvey commented May 10, 2024

Generalize Operand Quantization in FuseQuantizeOps #3327

Generalize Operand Quantization in FuseQuantizeOps #3327

Conversation

zjgarvey commented May 10, 2024

zjgarvey commented May 10, 2024