Add a lowering of `vector.interleave` to `vector.shuffle` #17346

bjacob · 2024-05-10T17:28:46Z

In llvm integrate #17330 we have to locally revert llvm/llvm-project#89131 because it causes vector.interleave to be created instead of vector.shuffle, and some GPU codegen backends expected vector.shuffle and are not handling vector.interleave.

llvm/llvm-project#89131 is good in itself though, as vector.interleave is more constrained than general vector.shuffle. We just need a lowering pattern from vector.interleave to vector.shuffle to be inserted into codegen pipelines. Then we will be able to drop the local revert of llvm/llvm-project#89131.

FYI @KoolJBlack @qedawkins @kuhar

The text was updated successfully, but these errors were encountered:

kuhar · 2024-05-10T17:46:23Z

Is this an mlir issue or iree issue? To me it sound like this should go to vector-to-spirv and vector-to-llvm?

bjacob · 2024-05-10T17:51:04Z

I was thinking vector-to-vector, rewriting vector.interleave to vector.shuffle. This way, only a single pattern is needed, not backend-specific, and by construction we know current backends are happy with vector.shuffle since that is what they currently get. It can go in this existing file:

https://github.com/llvm/llvm-project/blob/3bde7983986d8ce637f6bb506860859249787751/mlir/lib/Dialect/Vector/Transforms/LowerVectorInterleave.cpp#L4

Then also a IREE-side change to insert that new pattern into codegen pipelines; and also IREE-side, remembering to drop the revert of llvm/llvm-project#89131 at the following LLVM integrate.

kuhar · 2024-05-10T17:52:11Z

Oh OK, so this pattern is already there, just need to add add it to iree pipelines. Makes sense.

bjacob · 2024-05-10T17:52:55Z

no no, a file is there but it only contains an unrelated UnrollInterleaveOp pattern. The pattern that we need here does not exist yet, it needs to be created.

kuhar · 2024-05-10T17:53:04Z

On the spir-v side, I don't think there's any better lowering we could use anyway, spir-v has its own shuffle ops.

…800a3 (#17330) * torch-mlir integrated at bce800a. * llvm-project integrated at 2083e97e plus local changes: * Reverted llvm/llvm-project#89131 locally: while this change is good in its own right, the `vector.interleave` that it generates (instead of `vector.shuffle`) are not handled by some GPU codegen lowerings. * Filed #17346. * Cherry-picked Bazel build fix: llvm/llvm-project#91654 * Several e2e tests have been temporarily disabled, follow-up work is needed to reenable them: #17344 --------- Co-authored-by: MaheshRavishankar <[email protected]> Co-authored-by: Scott Todd <[email protected]>

bjacob · 2024-05-10T20:19:25Z

@kuhar @qedawkins , does this look like what we discussed? llvm/llvm-project#91800
Then I'm looking on the IREE side where to put this in the SPIRV pipeline. Maybe around

iree/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVInitialVectorLowering.cpp

Lines 298 to 511 in a3b7e12

    
             void runOnOperation() override { 
        
               MLIRContext *context = &getContext(); 
        
               auto funcOp = getOperation(); 
        
               bool emitIntegerDotProdOps = supportsIntegerDotProductOps(funcOp); 
        
               // First apply vectorization to generate vectors of the original tensor 
        
               // shape for tensor.pad ops. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 // Pull in additional vectorization patterns in IREE. 
        
                 populateVectorizePadPatterns(patterns); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after vectorizing tensor.pad"); 
        
               // Special peephole optimizations to clean up IR before further processing. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 // Pull in patterns to shuffle broadcast/transpose ops around in order to 
        
                 // cancel them or embed into contract ops. Embedding in the flexible 
        
                 // contract ops will help to sustain the structure through various 
        
                 // transformations. 
        
                 vector::populateVectorReductionToContractPatterns(patterns); 
        
                 // Pull in patterns to canonicalize transfer ops. 
        
                 vector::populateVectorTransferPermutationMapLoweringPatterns(patterns); 
        
                 // Fold consumer add ops into the contraction op itself. 
        
                 vector::ContractionOp::getCanonicalizationPatterns(patterns, context); 
        
                 // Fold transpose ops if possible as we cannot unroll it later. 
        
                 vector::TransposeOp::getCanonicalizationPatterns(patterns, context); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after peephole optimization"); 
        
               // High dimension contraction can appear after vectorizing ops like 1-D 
        
               // convolution. Those 1-D convolution ops typically have a leading unit 
        
               // batch dimension. Try to drop that to map to matmul dimensions better. 
        
               SmallVector<vector::ContractionOp> contractOps; 
        
               funcOp.walk([&](vector::ContractionOp op) { 
        
                 if (op.getIteratorTypes().size() > 3) 
        
                   contractOps.push_back(op); 
        
               }); 
        
               for (vector::ContractionOp op : contractOps) { 
        
                 OpBuilder builder(op); 
        
                 IRRewriter rewriter(builder); 
        
                 auto result = vector::castAwayContractionLeadingOneDim( 
        
                     op, /*maskingOp=*/nullptr, rewriter); 
        
                 if (succeeded(result)) { 
        
                   rewriter.replaceOp(op, *result); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after trimming contract leading unit dims"); 
        
               // Fold tensor.extract_slice/insert_slice ops into transfer ops. This helps 
        
               // to remove those tensor slice ops so that we can enable further vector op 
        
               // transformations. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 vector::TransferReadOp::getCanonicalizationPatterns(patterns, context); 
        
                 vector::TransferWriteOp::getCanonicalizationPatterns(patterns, context); 
        
                 populateVectorTransferTensorSliceTransforms(patterns); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after folding tensor extract/insert slice ops"); 
        
               // Lower vector.multi_dimension early if any operand is a transpose op. 
        
               // The lowering itself generates transpose ops. This helps to cancel 
        
               // transpose ops. vector.multi_reduction is arguably a higher level op and 
        
               // the lowering also unrolls the multi_reduction op, so it makes sense to 
        
               // happen before normal unrolling. 
        
               { 
        
                 SmallVector<Operation *> reductionOps; 
        
                 funcOp.walk([&](vector::MultiDimReductionOp reductionOp) { 
        
                   if (llvm::any_of(reductionOp->getOperands(), [](Value operand) { 
        
                         return operand.getDefiningOp<vector::TransposeOp>(); 
        
                       })) 
        
                     reductionOps.push_back(reductionOp); 
        
                   return WalkResult::advance(); 
        
                 }); 
        
                 RewritePatternSet patterns(context); 
        
                 vector::populateVectorMultiReductionLoweringPatterns( 
        
                     patterns, vector::VectorMultiReductionLowering::InnerParallel); 
        
                 if (failed(applyOpPatternsAndFold(reductionOps, std::move(patterns)))) { 
        
                   funcOp.emitOpError("vector lowering failed"); 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after lowering multi reduction ops"); 
        
               // Prepare for SPIR-V integer dot product lowering. 
        
               if (emitIntegerDotProdOps) { 
        
                 RewritePatternSet patterns(context); 
        
                 vector::populateVectorContractCanonicalizeMatmulToMMT( 
        
                     patterns, detectI8ToI32Matmul); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
                 debugPrint(funcOp, "after preparing for SPIR-V dot product lowering"); 
        
               } 
        
               // Then unroll vectors to native vector size. We try to use 128-bit 
        
               // vectors for memory access and 4/2/1 vector sizes for computation. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 populateVectorUnrollPatterns(patterns, emitIntegerDotProdOps); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after unrolling vector ops"); 
        
               // Lower reduction-unrolled vector contract ops. Such contract ops have 
        
               // their reduction dimensions all be one, so we can convert them into 
        
               // elementwise ops. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 auto options = 
        
                     vector::VectorTransformsOptions().setVectorTransformsOptions( 
        
                         vector::VectorContractLowering::ParallelArith); 
        
                 vector::populateVectorContractLoweringPatterns(patterns, options); 
        
                 // The pattern can generate transpose ops. Try to fold it if possible to 
        
                 // avoid lowering them into extract/insert later. 
        
                 vector::TransposeOp::getCanonicalizationPatterns(patterns, context); 
        
                 // It also generates broadcast/extract ops. Clean up them too. 
        
                 vector::BroadcastOp::getCanonicalizationPatterns(patterns, context); 
        
                 vector::ExtractOp::getCanonicalizationPatterns(patterns, context); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after lowering size-1 reduction contract ops"); 
        
               // Now lower vector transpose given we have handled vector patterns that may 
        
               // generate transpose ops in previous steps. This converts transpose ops 
        
               // into extract and insert pairs, in preparation of further transformations 
        
               // to canonicalize/cancel. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 auto options = 
        
                     vector::VectorTransformsOptions().setVectorTransposeLowering( 
        
                         vector::VectorTransposeLowering::EltWise); 
        
                 vector::populateVectorTransposeLoweringPatterns(patterns, options); 
        
                 vector::populateVectorShapeCastLoweringPatterns(patterns); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after lowering transpose ops"); 
        
               // Next run canonicalization to cast away leading size-1 dimensions. They 
        
               // can be generated from vector unrolling and generally cause issues to 
        
               // cancel corresponding read/write or insert/extract op pairs. This also 
        
               // need to happen before hoisting, where we would make certain vectors loop 
        
               // carried. Once that's done, it's hard to handle the leading size-1 
        
               // dimensions across regions. 
        
               { 
        
                 RewritePatternSet patterns(context); 
        
                 // We need to pull in casting way leading one dims to allow cancelling 
        
                 // some read/write ops. 
        
                 vector::populateCastAwayVectorLeadingOneDimPatterns(patterns); 
        
                 // We may have vector.insert_strided_slice inserting 1-D native vectors 
        
                 // into n-D larger vectors with the above. Break that down too. This is a 
        
                 // companion transformation of unrolling. 
        
                 vector::populateVectorInsertExtractStridedSliceDecompositionPatterns( 
        
                     patterns); 
        
                 vector::ExtractOp::getCanonicalizationPatterns(patterns, context); 
        
                 // Trimming leading unit dims may generate broadcast/shape_cast ops. Clean 
        
                 // them up. 
        
                 vector::BroadcastOp::getCanonicalizationPatterns(patterns, context); 
        
                 vector::ShapeCastOp::getCanonicalizationPatterns(patterns, context); 
        
                 vector::TransferReadOp::getCanonicalizationPatterns(patterns, context); 
        
                 vector::TransferWriteOp::getCanonicalizationPatterns(patterns, context); 
        
                 populateVectorTransferTensorSliceTransforms(patterns); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
               } 
        
               debugPrint(funcOp, "after trimming leading unit dims"); 
        
               // Lower vector reduction to SPIR-V integer dot product. 
        
               if (emitIntegerDotProdOps) { 
        
                 RewritePatternSet patterns(context); 
        
                 populateVectorReductionToSPIRVDotProductPatterns(patterns); 
        
                 if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns)))) { 
        
                   return signalPassFailure(); 
        
                 } 
        
                 debugPrint(funcOp, "after lowering to SPIR-V dot product"); 
        
               } 
        
             } 
        
           };

? where exactly in that big function?

kuhar · 2024-05-10T20:24:52Z

Can we put it upstream in https://github.com/llvm/llvm-project/blob/e9f53e4095d8a8600b5c5d445c73e2d5a6f45abb/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L812 ?

kuhar · 2024-05-10T20:26:40Z

Otherwise here also seems fine:

iree/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVInitialVectorLowering.cpp

Line 480 in a3b7e12

vector::populateVectorInsertExtractStridedSliceDecompositionPatterns(

qedawkins · 2024-05-10T20:26:57Z

Nice, that looks great to me. In terms of the IREE side, I think some combination of adding the vector.interleave n-d -> 1-d here:

iree/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVInitialVectorLowering.cpp

Line 330 in a3b7e12

vector::TransposeOp::getCanonicalizationPatterns(patterns, context);

And then the interleave to shuffle can either go in the same place, or somewhere near here:

iree/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVInitialVectorLowering.cpp

Line 391 in a3b7e12

patterns, vector::VectorMultiReductionLowering::InnerParallel);

Basically after decomposing to 1d and before unrolling to 1/2/3/4 vector elements.

qedawkins · 2024-05-10T20:28:12Z

Otherwise here also seems fine:

iree/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVInitialVectorLowering.cpp

Line 480 in a3b7e12

vector::populateVectorInsertExtractStridedSliceDecompositionPatterns(

Correct me if otherwise, but I was thinking it had to happen before unrolling to <= 4 elements? Unless interleave implements the unrolling interface already.

kuhar · 2024-05-10T20:29:47Z

My hope would be that unrolling could break it down to source vectors in some cases already, but I haven't checked if it support the unrolling interface.

kuhar · 2024-05-10T20:31:43Z

Basically after decomposing to 1d and before unrolling to 1/2/3/4 vector elements.

This is also a good option if it doesn't support unrolling.

bjacob · 2024-05-10T20:34:15Z

Can we put it upstream in https://github.com/llvm/llvm-project/blob/e9f53e4095d8a8600b5c5d445c73e2d5a6f45abb/mlir/lib/Conversion/VectorToSPIRV/VectorToSPIRV.cpp#L812 ?

I have updated the PR with that, thanks for the tip!

Nice, that looks great to me. In terms of the IREE side, I think some combination of adding the vector.interleave n-d -> 1-d here:

@kuhar @qedawkins I am just trying to fix the immediate issue that is forcing us to carry a local LLVM revert here. This issue only seems to involve 1D vectors as far as I have seen so far.

qedawkins · 2024-05-10T20:41:58Z

Makes sense, then I would say any point before here is likely to work:

iree/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVInitialVectorLowering.cpp

Line 416 in a3b7e12

populateVectorUnrollPatterns(patterns, emitIntegerDotProdOps);

bjacob · 2024-05-10T20:43:40Z

So should the upstream PR do it in VectorToSPIRV.cpp#L812 or not?

If it should do that upstream, then nothing needs to be done on the IREE side, right?

qedawkins · 2024-05-10T20:46:14Z

I'm guessing we'll still need to do something on the IREE side because in addition to the requirement that all vectors must be 1-d, on SPIR-V they must also be <= 4 elements wide, and the unrolling that goes to <= 4 elements happens in IREE right now. So I'm thinking VectorToSPIRV will be too late.

bjacob · 2024-05-10T20:49:35Z

but that sounds like solving a more general problem than my immediate concern of flattening our llvm integration. The upstream change that we have to carry a local revert of isn't changing the number of elements in a vector, or the rank of a vector. It's just changing vector.shuffle-on-1d-vector to vector.interleave-on-1d-vector.

…le in VectorToSPIRV (#91800) Context: iree-org/iree#17346. Test IREE integrate showing it's fixing the problem it's intended to fix, i.e. it allows IREE to drop its local revert of #89131: iree-org/iree#17359 This is added to VectorToSPIRV because SPIRV doesn't currently handle `vector.interleave` (see motivating context above). This is limited to 1D, non-scalable vectors.

…le in VectorToSPIRV (#92012) This is the second attempt at merging #91800, which bounced due to a linker error apparently caused by an undeclared dependency. `MLIRVectorToSPIRV` needed to depend on `MLIRVectorTransforms`. In fact that was a preexisting issue already flagged by the tool in https://discourse.llvm.org/t/ninja-can-now-check-for-missing-cmake-dependencies-on-generated-files/74344. Context: iree-org/iree#17346. Test IREE integrate showing it's fixing the problem it's intended to fix, i.e. it allows IREE to drop its local revert of #89131: iree-org/iree#17359 This is added to VectorToSPIRV because SPIRV doesn't currently handle `vector.interleave` (see motivating context above). This is limited to 1D, non-scalable vectors.

…800a3 (iree-org#17330) * torch-mlir integrated at bce800a. * llvm-project integrated at 2083e97e plus local changes: * Reverted llvm/llvm-project#89131 locally: while this change is good in its own right, the `vector.interleave` that it generates (instead of `vector.shuffle`) are not handled by some GPU codegen lowerings. * Filed iree-org#17346. * Cherry-picked Bazel build fix: llvm/llvm-project#91654 * Several e2e tests have been temporarily disabled, follow-up work is needed to reenable them: iree-org#17344 --------- Co-authored-by: MaheshRavishankar <[email protected]> Co-authored-by: Scott Todd <[email protected]>

…800a3 (iree-org#17330) * torch-mlir integrated at bce800a. * llvm-project integrated at 2083e97e plus local changes: * Reverted llvm/llvm-project#89131 locally: while this change is good in its own right, the `vector.interleave` that it generates (instead of `vector.shuffle`) are not handled by some GPU codegen lowerings. * Filed iree-org#17346. * Cherry-picked Bazel build fix: llvm/llvm-project#91654 * Several e2e tests have been temporarily disabled, follow-up work is needed to reenable them: iree-org#17344 --------- Co-authored-by: MaheshRavishankar <[email protected]> Co-authored-by: Scott Todd <[email protected]> Signed-off-by: Lubo Litchev <[email protected]>

bjacob mentioned this issue May 10, 2024

Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Merged

bjacob mentioned this issue May 10, 2024

[mlir][vector] Add Vector-dialect interleave-to-shuffle pattern, enable in VectorToSPIRV llvm/llvm-project#91800

Merged

bjacob mentioned this issue May 13, 2024

[mlir][vector] Add Vector-dialect interleave-to-shuffle pattern, enable in VectorToSPIRV llvm/llvm-project#92012

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a lowering of `vector.interleave` to `vector.shuffle` #17346

Add a lowering of `vector.interleave` to `vector.shuffle` #17346

bjacob commented May 10, 2024

kuhar commented May 10, 2024

bjacob commented May 10, 2024 •

edited

Loading

kuhar commented May 10, 2024

bjacob commented May 10, 2024

kuhar commented May 10, 2024

bjacob commented May 10, 2024

kuhar commented May 10, 2024

kuhar commented May 10, 2024

qedawkins commented May 10, 2024

qedawkins commented May 10, 2024

kuhar commented May 10, 2024

kuhar commented May 10, 2024

bjacob commented May 10, 2024

qedawkins commented May 10, 2024

bjacob commented May 10, 2024

qedawkins commented May 10, 2024 •

edited

Loading

bjacob commented May 10, 2024 •

edited

Loading

Add a lowering of vector.interleave to vector.shuffle #17346

Add a lowering of vector.interleave to vector.shuffle #17346

Comments

bjacob commented May 10, 2024

kuhar commented May 10, 2024

bjacob commented May 10, 2024 • edited Loading

kuhar commented May 10, 2024

bjacob commented May 10, 2024

kuhar commented May 10, 2024

bjacob commented May 10, 2024

kuhar commented May 10, 2024

kuhar commented May 10, 2024

qedawkins commented May 10, 2024

qedawkins commented May 10, 2024

kuhar commented May 10, 2024

kuhar commented May 10, 2024

bjacob commented May 10, 2024

qedawkins commented May 10, 2024

bjacob commented May 10, 2024

qedawkins commented May 10, 2024 • edited Loading

bjacob commented May 10, 2024 • edited Loading

Add a lowering of `vector.interleave` to `vector.shuffle` #17346

Add a lowering of `vector.interleave` to `vector.shuffle` #17346

bjacob commented May 10, 2024 •

edited

Loading

qedawkins commented May 10, 2024 •

edited

Loading

bjacob commented May 10, 2024 •

edited

Loading