Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

bjacob · 2024-05-09T17:06:33Z

torch-mlir integrated at bce800a.
llvm-project integrated at 2083e97e plus local changes:
- Reverted [mlir][Vector] Add patterns for efficient unsigned i4 -> i8 conversion emulation llvm/llvm-project#89131 locally: while this change is good in its own right, the vector.interleave that it generates (instead of vector.shuffle) are not handled by some GPU codegen lowerings.
  - Filed Add a lowering of vector.interleave to vector.shuffle #17346.
- Cherry-picked Bazel build fix: [bazel] Fix new CodeGen dep llvm/llvm-project#91654
Several e2e tests have been temporarily disabled, follow-up work is needed to reenable them: Regenerate .mlirbc files for tests and benchmarks after LLVM integrate #17330 #17344

AmosLewis · 2024-05-09T17:38:31Z

Could you bump torch-mlir to one more commit forward. This will unblocked a lot SharkTestsuite ONNX model tests, which has a deadline this week.
commit ec6d7aa
Author: aldesilv [email protected]
Date: Wed May 8 14:35:03 2024 -0700

OnnxToTorch lowering resize op (#3013)

AmosLewis · 2024-05-09T17:51:33Z

Why not bump iree llvmproject into dabdec1001dc368373dd581cf72f37a440873ce? same as the torch-mlir bce800a3 bring in.

ScottTodd · 2024-05-09T17:54:15Z

third_party/llvm-project

The opt-125M failures on https://github.com/iree-org/iree/actions/runs/9020826623/job/24787542994?pr=17330#step:9:39 are odd:

___________ IREE compile and run: opt-125M::gpu_vulkan_real_weights ____________ Error invoking iree-compile Error code: 1 Stderr diagnostics: opt-125M.mlirbc:0:0: error: attempting to parse a byte at the end of the bytecode opt-125M.mlirbc:0:0: note: in bytecode version 6 produced by: MLIR19.0.0git Invoked with: cd /home/esaimana/actions-runner/_work/iree/iree/SHARK-TestSuite/iree_tests/pytorch/models/opt-125M && iree-compile opt-125M.mlirbc --iree-hal-target-backends=vulkan-spirv -o opt-125M_gpu_vulkan_real_weights.vmfb

I don't think that file has been modified and the test should be passing before these changes. Did the bytecode format or its parsing change?

This is still happening. @MaheshRavishankar @ScottTodd what do we do, can we check if a MLIR bytecode version bump did just occur and can we then regenerate or temporarily disable this test?

I don't see any recent changes in https://github.com/llvm/llvm-project/tree/main/mlir/include/mlir/Bytecode or https://github.com/llvm/llvm-project/tree/main/mlir/lib/Bytecode. Not sure why just that one test would be failing, so would like to investigate/debug.

To keep the integrate train rolling, we can disable by marking as an expected compile failure in the _models_ files at https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite

I thought vulkan was always failing for sdxl?

Oh the resnet test is also failing. Those (opt-125M and resnet50) are two real (new) failures in parsing the bytecode. The SDXL Vulkan failures are already marked XFAIL, ignore those.

Lets wait for the CI to run again. This might be related to the vector.shuffle/vector.interleave issue

Oh no this is bad! It is failing on rocm now, and maybe the error is misleading, but is it failing to read the input file?

bjacob · 2024-05-09T18:08:40Z

@AmosLewis

Could you bump torch-mlir to one more commit forward.

OK because it's just one more commit (we need this process to converge now).

Why not bump iree llvmproject into dabdec1001dc368373dd581cf72f37a440873ce? same as the torch-mlir bce800a3 bring in.

Because IREE requires additional fixes.

Strange one, the constants shouldnt change this way. ¯\_(ツ)_/¯

MaheshRavishankar · 2024-05-09T22:43:23Z

There are some other tests also failing with the same bytecode parsing error.

ScottTodd · 2024-05-09T22:45:59Z

There are some other tests also failing with the same bytecode parsing error.

https://github.com/iree-org/iree/actions/runs/9023942190/job/24797359659?pr=17330#step:7:720

/work/build-e2e-test-artifacts/e2e_test_artifacts/model_Falcon7bGptqPT.mlirbc:0:0: error: attempting to parse a byte at the end of the bytecode
/work/build-e2e-test-artifacts/e2e_test_artifacts/model_Falcon7bGptqPT.mlirbc:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

Okay phew (or eek), not just files that I uploaded myself. That rules out some things. Let's try to find whatever upstream change(s) affected the parsing.

ScottTodd · 2024-05-10T03:49:21Z

Might have a CUDA test newly timing out: https://github.com/iree-org/iree/actions/runs/9023942176/job/24797058825?pr=17330 had to cancel that job after 6 hours.

* Edited the source .py files to remove benchmarks that failed on CI with `error: attempting to parse a byte at the end of the bytecode` * Ran `bash ./build_tools/scripts/generate_cmake_files.sh` * Fixed path separators (Windows vs Linux)

bjacob · 2024-05-10T16:44:24Z

@AmosLewis

Could you bump torch-mlir to one more commit forward.

OK because it's just one more commit (we need this process to converge now).

Actually this (llvm/torch-mlir#3013) turned out to cause additional problems here: #17345. This integrate has been reverted to its original torch-mlir target.

Discord conversation:
https://discord.com/channels/689900678990135345/1080178290188374049/1238530135444029574

This reverts commit 3faa5b7.

ScottTodd

LGTM when CI goes green (assuming also that benchmarks don't have any glaring regressions)

github-actions · 2024-05-10T18:43:48Z

Abbreviated Benchmark Summary

@ commit eab01522fae6b391ca7cb61469647c3862c11e0e (vs. base c81496c512d01cda26233932aba1358b8abe8164)

Data-Tiling Comparison Table

Click to show

Name	No-DT (baseline)	DT-Only	DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	218.871 (1.0X)	138.787 (1.6X)	114.534 (1.9X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	682.239 (1.0X)	279.415 (2.4X)	229.550 (3.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	32.321 (1.0X)	40.425 (0.8X)	33.093 (1.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	6.951 (1.0X)	9.505 (0.7X)	8.592 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	266.821 (1.0X)	262.557 (1.0X)	232.773 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	34.684 (1.0X)	36.950 (0.9X)	33.602 (1.0X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	28.917 (1.0X)	52.818 (0.5X)	15.415 (1.9X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	5.905 (1.0X)	11.113 (0.5X)	5.288 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	70.217 (1.0X)	36.882 (1.9X)	38.583 (1.8X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	8.939 (1.0X)	8.716 (1.0X)	8.485 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	87.885 (1.0X)	41.768 (2.1X)	40.569 (2.2X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	10.548 (1.0X)	8.647 (1.2X)	8.206 (1.3X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	76.774 (1.0X)	85.352 (0.9X)	62.002 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	12.019 (1.0X)	14.663 (0.8X)	12.829 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	180.607 (1.0X)	250.536 (0.7X)	187.278 (1.0X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	34.365 (1.0X)	62.764 (0.5X)	57.374 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	179.777 (1.0X)	251.807 (0.7X)	191.730 (0.9X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	34.947 (1.0X)	62.745 (0.6X)	58.262 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	481.072 (1.0X)	1055.075 (0.5X)	213.682 (2.3X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	61.027 (1.0X)	219.863 (0.3X)	64.103 (1.0X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	28.163 (1.0X)	22.544 (1.2X)	18.165 (1.6X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	4.724 (1.0X)	5.149 (0.9X)	4.477 (1.1X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	11.959 (1.0X)	15.855 (0.8X)	12.312 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	3.640 (1.0X)	5.260 (0.7X)	4.874 (0.7X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	21.470 (1.0X)	44.649 (0.5X)	14.058 (1.5X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	5.751 (1.0X)	9.903 (0.6X)	5.591 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu]	2.796 (1.0X)	3.858 (0.7X)	3.088 (0.9X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	2.911 (1.0X)	3.931 (0.7X)	3.221 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	34.853 (1.0X)	40.555 (0.9X)	32.258 (1.1X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	8.390 (1.0X)	10.683 (0.8X)	9.524 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu]	0.704 (1.0X)	1.422 (0.5X)	0.600 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	0.775 (1.0X)	1.499 (0.5X)	0.661 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	17.600 (1.0X)	26.606 (0.7X)	21.143 (0.8X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu]	4.213 (1.0X)	6.087 (0.7X)	5.257 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu]	7.563 (1.0X)	7.603 (1.0X)	7.587 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	49.615 (1.0X)	84.214 (0.6X)	79.572 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	51.567 (1.0X)	86.287 (0.6X)	79.379 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	31.100 (1.0X)	49.997 (0.6X)	46.370 (0.7X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	92.140 (1.0X)	22.242 (4.1X)	22.435 (4.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	92.637 (1.0X)	22.244 (4.2X)	22.490 (4.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	52.552 (1.0X)	22.105 (2.4X)	22.205 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	141.830 (1.0X)	28.417 (5.0X)	27.302 (5.2X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	141.426 (1.0X)	30.145 (4.7X)	29.499 (4.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	76.736 (1.0X)	26.926 (2.8X)	26.157 (2.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	701.147 (1.0X)	445.483 (1.6X)	363.188 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	713.215 (1.0X)	452.159 (1.6X)	371.107 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	410.945 (1.0X)	274.823 (1.5X)	224.618 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	1045.946 (1.0X)	621.012 (1.7X)	256.017 (4.1X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	1046.877 (1.0X)	625.588 (1.7X)	260.366 (4.0X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	549.582 (1.0X)	340.229 (1.6X)	152.081 (3.6X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	2097.472 (1.0X)	1084.034 (1.9X)	305.720 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	2098.906 (1.0X)	1086.373 (1.9X)	307.993 (6.8X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	1134.974 (1.0X)	612.057 (1.9X)	185.600 (6.1X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores]	12.116 (1.0X)	10.035 (1.2X)	1.460 (8.3X)

Regressed Latencies 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu]	112.795 (vs. 89.268, 26.36%↑)	113.191	1.329
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu]	97.999 (vs. 81.405, 20.38%↑)	97.928	0.602
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu]	86.750 (vs. 79.551, 9.05%↑)	86.738	0.323

[Top 3 out of 7 results showed]

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	29.499 (vs. 32.260, 8.56%↓)	29.673	0.880
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	30.145 (vs. 32.837, 8.20%↓)	30.613	1.036
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores]	49.997 (vs. 53.585, 6.70%↓)	50.484	1.594

[Top 3 out of 15 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

This allows dropping our existing local-revert of llvm/llvm-project#89131 and cherry-pick of llvm/llvm-project#91654 which we had introduced in the earlier integrate #17330. This locally reverts llvm/llvm-project#90802 because it causes numerical errors, reported at llvm/llvm-project#90802 (comment).

…800a3 (iree-org#17330) * torch-mlir integrated at bce800a. * llvm-project integrated at 2083e97e plus local changes: * Reverted llvm/llvm-project#89131 locally: while this change is good in its own right, the `vector.interleave` that it generates (instead of `vector.shuffle`) are not handled by some GPU codegen lowerings. * Filed iree-org#17346. * Cherry-picked Bazel build fix: llvm/llvm-project#91654 * Several e2e tests have been temporarily disabled, follow-up work is needed to reenable them: iree-org#17344 --------- Co-authored-by: MaheshRavishankar <[email protected]> Co-authored-by: Scott Todd <[email protected]>

This allows dropping our existing local-revert of llvm/llvm-project#89131 and cherry-pick of llvm/llvm-project#91654 which we had introduced in the earlier integrate iree-org#17330. This locally reverts llvm/llvm-project#90802 because it causes numerical errors, reported at llvm/llvm-project#90802 (comment).

…800a3 (iree-org#17330) * torch-mlir integrated at bce800a. * llvm-project integrated at 2083e97e plus local changes: * Reverted llvm/llvm-project#89131 locally: while this change is good in its own right, the `vector.interleave` that it generates (instead of `vector.shuffle`) are not handled by some GPU codegen lowerings. * Filed iree-org#17346. * Cherry-picked Bazel build fix: llvm/llvm-project#91654 * Several e2e tests have been temporarily disabled, follow-up work is needed to reenable them: iree-org#17344 --------- Co-authored-by: MaheshRavishankar <[email protected]> Co-authored-by: Scott Todd <[email protected]> Signed-off-by: Lubo Litchev <[email protected]>

This allows dropping our existing local-revert of llvm/llvm-project#89131 and cherry-pick of llvm/llvm-project#91654 which we had introduced in the earlier integrate iree-org#17330. This locally reverts llvm/llvm-project#90802 because it causes numerical errors, reported at llvm/llvm-project#90802 (comment). Signed-off-by: Lubo Litchev <[email protected]>

ScottTodd reviewed May 9, 2024

View reviewed changes

bjacob changed the title ~~Integrate both llvm-project@2083e97e and torch-mlir@bce800a3~~ Integrate both llvm-project@2083e97e and torch-mlir@ec6d7aa5d28 May 9, 2024

bjacob changed the title ~~Integrate both llvm-project@2083e97e and torch-mlir@ec6d7aa5d28~~ Integrate both llvm-project@2083e97e (+ 1 revert) and torch-mlir@ec6d7aa5d28 May 9, 2024

bjacob and others added 14 commits May 9, 2024 17:09

bump

e90b28d

Fix attention script.

a7c48c9

Fix attention_mfma test.

b4b6143

bump-torch-mlir-more

a9fb750

fixes

34bd66c

fixes

196fffa

revert-vector-interleave

151190f

Fix other transform dialect tests.

63283b7

Fix Convert To NVVM tests.

0c950c0

Strange one, the constants shouldnt change this way. ¯\_(ツ)_/¯

fix pad_to_intrinsics.mlir

9356a52

fix fuse_horizontal_contractions.mlir

26c30ca

fix data_layout_propagation.mlir

e8185d8

fix tiling.mlir

ef5bdcc

fix sink_reshapes.mlir

3e1d6e5

bjacob force-pushed the integrate-llvm-20240508 branch from 7527742 to 3e1d6e5 Compare May 9, 2024 21:09

Fix check-iree-dialects.

f5d992a

ScottTodd added 5 commits May 9, 2024 20:56

Mark opt-125M/resnet50 models as XFAIL (outdated .mlirbc files).

783b2dc

Add back trailing newline to appease lint check...?

cc9f0e0

Disable pkgci regression_suite tests (llama mlirbc needs regeneration).

685465e

Update Vulkan XFAILs. Others are blocked by other CI failures.

f055178

ScottTodd and others added 2 commits May 10, 2024 08:12

Run CUDA tests serially to spot timeouts.

3faa5b7

cherry-pick bazel fix 99f45b4c5b67cccb7845580a67b42776f49ef0e2

7d1e54c

bjacob changed the title ~~Integrate both llvm-project@2083e97e (+ 1 revert) and torch-mlir@ec6d7aa5d28~~ Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@ec6d7aa5d28 May 10, 2024

bjacob mentioned this pull request May 10, 2024

Regenerate .mlirbc files for tests and benchmarks after LLVM integrate #17330 #17344

Open

bjacob and others added 2 commits May 10, 2024 11:48

disable more tests and reference iree-org#17344

6983a4c

Update attention script

ef81522

ScottTodd mentioned this pull request May 10, 2024

ONNX "resize" op test failures #17345

Open

torch-mlir: go back to bce800a to avoid ONNX regressions with ec6d7aa

8bf949a

bjacob changed the title ~~Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@ec6d7aa5d28~~ Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 May 10, 2024

ScottTodd added 2 commits May 10, 2024 09:53

Revert "Run CUDA tests serially to spot timeouts."

5a0a4ce

This reverts commit 3faa5b7.

Fixup pkgci xfail sets.

0e1e3e0

ScottTodd approved these changes May 10, 2024

View reviewed changes

bjacob mentioned this pull request May 10, 2024

Add a lowering of vector.interleave to vector.shuffle #17346

Open

bjacob marked this pull request as ready for review May 10, 2024 17:35

bjacob requested review from rsuderman, hanhanW, qedawkins, stellaraccident and benvanik as code owners May 10, 2024 17:36

ScottTodd merged commit a3b7e12 into iree-org:main May 10, 2024
64 of 65 checks passed

AmosLewis mentioned this pull request May 12, 2024

Integrate torch-mlir@ec6d7aa onnx.resize op #17358

Merged

bjacob mentioned this pull request May 14, 2024

Integrate LLVM at a1d43c14d (+1 revert) #17380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

bjacob commented May 9, 2024 •

edited

Loading

AmosLewis commented May 9, 2024

AmosLewis commented May 9, 2024

ScottTodd May 9, 2024

bjacob May 9, 2024

ScottTodd May 9, 2024

MaheshRavishankar May 9, 2024

ScottTodd May 9, 2024 •

edited

Loading

MaheshRavishankar May 9, 2024

MaheshRavishankar May 9, 2024

bjacob commented May 9, 2024

MaheshRavishankar commented May 9, 2024

ScottTodd commented May 9, 2024

ScottTodd commented May 10, 2024

bjacob commented May 10, 2024 •

edited

Loading

ScottTodd left a comment

github-actions bot commented May 10, 2024

Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Conversation

bjacob commented May 9, 2024 • edited Loading

AmosLewis commented May 9, 2024

AmosLewis commented May 9, 2024

ScottTodd May 9, 2024

Choose a reason for hiding this comment

bjacob May 9, 2024

Choose a reason for hiding this comment

ScottTodd May 9, 2024

Choose a reason for hiding this comment

MaheshRavishankar May 9, 2024

Choose a reason for hiding this comment

ScottTodd May 9, 2024 • edited Loading

Choose a reason for hiding this comment

MaheshRavishankar May 9, 2024

Choose a reason for hiding this comment

MaheshRavishankar May 9, 2024

Choose a reason for hiding this comment

bjacob commented May 9, 2024

MaheshRavishankar commented May 9, 2024

ScottTodd commented May 9, 2024

ScottTodd commented May 10, 2024

bjacob commented May 10, 2024 • edited Loading

ScottTodd left a comment

Choose a reason for hiding this comment

github-actions bot commented May 10, 2024

Abbreviated Benchmark Summary

Data-Tiling Comparison Table

Regressed Latencies 🚩

Improved Latencies 🎉

bjacob commented May 9, 2024 •

edited

Loading

ScottTodd May 9, 2024 •

edited

Loading

bjacob commented May 10, 2024 •

edited

Loading