Generalize Operand Quantization in FuseQuantizeOps #3327
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change enables more customization with operand quantization, and generalizes the patterns QuantizeOperands and QuantizeTransposeOperands to QuantizeOperandsPastCommutingOps.
This allows for passing quantization through operations which are functionally unaffected by quantization, such as view-like ops. The purpose of this change is to address a myriad of quantization issues seen in quantized onnx models that have some reshape-like operations sandwiched in between a dequant and something like a matmul (whose other operand is immediately quantizable).
If other model failures of this type occur in the future, other authors can simply add the problematic ops to the qCommutingOps function defined here to enable the quantization to pass through.
For each op, I have set the depth of the search to the largest depth I have seen in practice:
For aten mm op, see this issue
For aten matmul op, see this issue
If longer patterns exist, the depth can easily be increased.