-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix extract_slice
causing compilation errors
#17519
Conversation
Abbreviated Benchmark Summary@ commit a108d2c016256b0e08484ce739e78f17399d5f68 (no previous benchmark results to compare) Data-Tiling Comparison TableClick to show
Raw Latencies
[Top 3 out of 135 results showed] No improved or regressed compilation metrics 🏖️ For more information: |
Signed-off-by: Ian Wood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, @IanWood1.
I'm a bit worried about the generic ops produced in this pattern. For example, if the scale of the initial dequant op is very small so that the results are all very close to zero, does your generated generic op for casting f32 -> i8 just give a zero tensor?
sourceGenericOp.getIteratorTypesArray(), | ||
[&](OpBuilder &nestedBuilder, Location loc, ValueRange args) { | ||
// Custom region for f32 -> i8 conversion | ||
auto castOp = nestedBuilder.create<arith::FPToSIOp>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does modification by the quantization scale come into play here? If you just cast the fp to si, I can't imagine you will retain any useful information.
Is it possible to copy the payload of a generic op, but modify the other information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah your right, this is definitely a problem. I think that solution might work, I'll take a look
compiler/src/iree/compiler/Dialect/Flow/Transforms/FusionPreprocessing.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Ian Wood <[email protected]>
Addresses this SHARK-TestSuite issue related to
ExtractSliceOp
causing compilation failures. The issue originated from a pattern ofdequantize-like
->tensor.extract_slice
->quantize-like
(within a dispatch). This means that thetensor.extract_slice
op was operating on f32 instead of the quantized i8 types. MLIR doesn't handle bufferization on extractslice/differing bit-width casts (see mlir/*/EmptyTensorElimination.cpp), causing stack allocations in the middle of dispatches.This is addressed by adding
linalg.generic
ops before and after the extract_slice op that cast back to the low bit-width type, eliminating the f32 tensors (after getting merged with thedequantize-like/quantize generic
TODOs:
Related issues:
nod-ai/SHARK-TestSuite#182
nod-ai/SHARK-ModelDev#683