Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize Operand Quantization in FuseQuantizeOps #3327

Merged
merged 4 commits into from
May 13, 2024

Conversation

zjgarvey
Copy link
Collaborator

This change enables more customization with operand quantization, and generalizes the patterns QuantizeOperands and QuantizeTransposeOperands to QuantizeOperandsPastCommutingOps.

This allows for passing quantization through operations which are functionally unaffected by quantization, such as view-like ops. The purpose of this change is to address a myriad of quantization issues seen in quantized onnx models that have some reshape-like operations sandwiched in between a dequant and something like a matmul (whose other operand is immediately quantizable).

If other model failures of this type occur in the future, other authors can simply add the problematic ops to the qCommutingOps function defined here to enable the quantization to pass through.

For each op, I have set the depth of the search to the largest depth I have seen in practice:

For aten mm op, see this issue
For aten matmul op, see this issue

If longer patterns exist, the depth can easily be increased.

@zjgarvey
Copy link
Collaborator Author

Ah, I see the issue. Debugging currently.

@rsuderman rsuderman merged commit 75d1d72 into llvm:main May 13, 2024
3 checks passed
BaneTrifa pushed a commit to BaneTrifa/torch-mlir that referenced this pull request May 24, 2024
This change enables more customization with operand quantization, and
generalizes the patterns QuantizeOperands and QuantizeTransposeOperands
to QuantizeOperandsPastCommutingOps.

This allows for passing quantization through operations which are
functionally unaffected by quantization, such as view-like ops. The
purpose of this change is to address a myriad of quantization issues
seen in quantized onnx models that have some reshape-like operations
sandwiched in between a dequant and something like a matmul (whose other
operand is immediately quantizable).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants