Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Merged
merged 27 commits into from
May 10, 2024

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented May 9, 2024

@AmosLewis
Copy link
Contributor

Could you bump torch-mlir to one more commit forward. This will unblocked a lot SharkTestsuite ONNX model tests, which has a deadline this week.
commit ec6d7aa
Author: aldesilv [email protected]
Date: Wed May 8 14:35:03 2024 -0700

OnnxToTorch lowering resize op (#3013)

@AmosLewis
Copy link
Contributor

Why not bump iree llvmproject into dabdec1001dc368373dd581cf72f37a440873ce? same as the torch-mlir bce800a3 bring in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opt-125M failures on https://github.com/iree-org/iree/actions/runs/9020826623/job/24787542994?pr=17330#step:9:39 are odd:

___________ IREE compile and run: opt-125M::gpu_vulkan_real_weights ____________
Error invoking iree-compile
Error code: 1
Stderr diagnostics:
opt-125M.mlirbc:0:0: error: attempting to parse a byte at the end of the bytecode
opt-125M.mlirbc:0:0: note: in bytecode version 6 produced by: MLIR19.0.0git


Invoked with:
  cd /home/esaimana/actions-runner/_work/iree/iree/SHARK-TestSuite/iree_tests/pytorch/models/opt-125M && iree-compile opt-125M.mlirbc --iree-hal-target-backends=vulkan-spirv -o opt-125M_gpu_vulkan_real_weights.vmfb

I don't think that file has been modified and the test should be passing before these changes. Did the bytecode format or its parsing change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still happening. @MaheshRavishankar @ScottTodd what do we do, can we check if a MLIR bytecode version bump did just occur and can we then regenerate or temporarily disable this test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any recent changes in https://github.com/llvm/llvm-project/tree/main/mlir/include/mlir/Bytecode or https://github.com/llvm/llvm-project/tree/main/mlir/lib/Bytecode. Not sure why just that one test would be failing, so would like to investigate/debug.

To keep the integrate train rolling, we can disable by marking as an expected compile failure in the _models_ files at https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought vulkan was always failing for sdxl?

Copy link
Member

@ScottTodd ScottTodd May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the resnet test is also failing. Those (opt-125M and resnet50) are two real (new) failures in parsing the bytecode. The SDXL Vulkan failures are already marked XFAIL, ignore those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets wait for the CI to run again. This might be related to the vector.shuffle/vector.interleave issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no this is bad! It is failing on rocm now, and maybe the error is misleading, but is it failing to read the input file?

@bjacob
Copy link
Contributor Author

bjacob commented May 9, 2024

@AmosLewis

Could you bump torch-mlir to one more commit forward.

OK because it's just one more commit (we need this process to converge now).

Why not bump iree llvmproject into dabdec1001dc368373dd581cf72f37a440873ce? same as the torch-mlir bce800a3 bring in.

Because IREE requires additional fixes.

@bjacob bjacob changed the title Integrate both llvm-project@2083e97e and torch-mlir@bce800a3 Integrate both llvm-project@2083e97e and torch-mlir@ec6d7aa5d28 May 9, 2024
@bjacob bjacob changed the title Integrate both llvm-project@2083e97e and torch-mlir@ec6d7aa5d28 Integrate both llvm-project@2083e97e (+ 1 revert) and torch-mlir@ec6d7aa5d28 May 9, 2024
@MaheshRavishankar
Copy link
Contributor

There are some other tests also failing with the same bytecode parsing error.

@ScottTodd
Copy link
Member

There are some other tests also failing with the same bytecode parsing error.

https://github.com/iree-org/iree/actions/runs/9023942190/job/24797359659?pr=17330#step:7:720

/work/build-e2e-test-artifacts/e2e_test_artifacts/model_Falcon7bGptqPT.mlirbc:0:0: error: attempting to parse a byte at the end of the bytecode
/work/build-e2e-test-artifacts/e2e_test_artifacts/model_Falcon7bGptqPT.mlirbc:0:0: note: in bytecode version 6 produced by: MLIR18.0.0git

Okay phew (or eek), not just files that I uploaded myself. That rules out some things. Let's try to find whatever upstream change(s) affected the parsing.

@ScottTodd
Copy link
Member

Might have a CUDA test newly timing out: https://github.com/iree-org/iree/actions/runs/9023942176/job/24797058825?pr=17330 had to cancel that job after 6 hours.

* Edited the source .py files to remove benchmarks that failed on CI with `error: attempting to parse a byte at the end of the bytecode`
* Ran `bash ./build_tools/scripts/generate_cmake_files.sh`
* Fixed path separators (Windows vs Linux)
@bjacob bjacob changed the title Integrate both llvm-project@2083e97e (+ 1 revert) and torch-mlir@ec6d7aa5d28 Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@ec6d7aa5d28 May 10, 2024
@bjacob bjacob changed the title Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@ec6d7aa5d28 Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 May 10, 2024
@bjacob
Copy link
Contributor Author

bjacob commented May 10, 2024

@AmosLewis

Could you bump torch-mlir to one more commit forward.

OK because it's just one more commit (we need this process to converge now).

Actually this (llvm/torch-mlir#3013) turned out to cause additional problems here: #17345. This integrate has been reverted to its original torch-mlir target.

Discord conversation:
https://discord.com/channels/689900678990135345/1080178290188374049/1238530135444029574

Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when CI goes green (assuming also that benchmarks don't have any glaring regressions)

Copy link

Abbreviated Benchmark Summary

@ commit eab01522fae6b391ca7cb61469647c3862c11e0e (vs. base c81496c512d01cda26233932aba1358b8abe8164)

Data-Tiling Comparison Table

Click to show
Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 218.871 (1.0X) 138.787 (1.6X) 114.534 (1.9X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 682.239 (1.0X) 279.415 (2.4X) 229.550 (3.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.321 (1.0X) 40.425 (0.8X) 33.093 (1.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.951 (1.0X) 9.505 (0.7X) 8.592 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 266.821 (1.0X) 262.557 (1.0X) 232.773 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.684 (1.0X) 36.950 (0.9X) 33.602 (1.0X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 28.917 (1.0X) 52.818 (0.5X) 15.415 (1.9X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.905 (1.0X) 11.113 (0.5X) 5.288 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.217 (1.0X) 36.882 (1.9X) 38.583 (1.8X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.939 (1.0X) 8.716 (1.0X) 8.485 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.885 (1.0X) 41.768 (2.1X) 40.569 (2.2X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.548 (1.0X) 8.647 (1.2X) 8.206 (1.3X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 76.774 (1.0X) 85.352 (0.9X) 62.002 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.019 (1.0X) 14.663 (0.8X) 12.829 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.607 (1.0X) 250.536 (0.7X) 187.278 (1.0X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.365 (1.0X) 62.764 (0.5X) 57.374 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.777 (1.0X) 251.807 (0.7X) 191.730 (0.9X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.947 (1.0X) 62.745 (0.6X) 58.262 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 481.072 (1.0X) 1055.075 (0.5X) 213.682 (2.3X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.027 (1.0X) 219.863 (0.3X) 64.103 (1.0X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 28.163 (1.0X) 22.544 (1.2X) 18.165 (1.6X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.724 (1.0X) 5.149 (0.9X) 4.477 (1.1X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.959 (1.0X) 15.855 (0.8X) 12.312 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.640 (1.0X) 5.260 (0.7X) 4.874 (0.7X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.470 (1.0X) 44.649 (0.5X) 14.058 (1.5X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.751 (1.0X) 9.903 (0.6X) 5.591 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.796 (1.0X) 3.858 (0.7X) 3.088 (0.9X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.911 (1.0X) 3.931 (0.7X) 3.221 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.853 (1.0X) 40.555 (0.9X) 32.258 (1.1X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.390 (1.0X) 10.683 (0.8X) 9.524 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.704 (1.0X) 1.422 (0.5X) 0.600 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.775 (1.0X) 1.499 (0.5X) 0.661 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.600 (1.0X) 26.606 (0.7X) 21.143 (0.8X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.213 (1.0X) 6.087 (0.7X) 5.257 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.563 (1.0X) 7.603 (1.0X) 7.587 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 49.615 (1.0X) 84.214 (0.6X) 79.572 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 51.567 (1.0X) 86.287 (0.6X) 79.379 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 31.100 (1.0X) 49.997 (0.6X) 46.370 (0.7X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.140 (1.0X) 22.242 (4.1X) 22.435 (4.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 92.637 (1.0X) 22.244 (4.2X) 22.490 (4.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.552 (1.0X) 22.105 (2.4X) 22.205 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 141.830 (1.0X) 28.417 (5.0X) 27.302 (5.2X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 141.426 (1.0X) 30.145 (4.7X) 29.499 (4.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.736 (1.0X) 26.926 (2.8X) 26.157 (2.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 701.147 (1.0X) 445.483 (1.6X) 363.188 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 713.215 (1.0X) 452.159 (1.6X) 371.107 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 410.945 (1.0X) 274.823 (1.5X) 224.618 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1045.946 (1.0X) 621.012 (1.7X) 256.017 (4.1X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1046.877 (1.0X) 625.588 (1.7X) 260.366 (4.0X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 549.582 (1.0X) 340.229 (1.6X) 152.081 (3.6X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2097.472 (1.0X) 1084.034 (1.9X) 305.720 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2098.906 (1.0X) 1086.373 (1.9X) 307.993 (6.8X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1134.974 (1.0X) 612.057 (1.9X) 185.600 (6.1X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.116 (1.0X) 10.035 (1.2X) 1.460 (8.3X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 112.795 (vs. 89.268, 26.36%↑) 113.191 1.329
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 97.999 (vs. 81.405, 20.38%↑) 97.928 0.602
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 86.750 (vs. 79.551, 9.05%↑) 86.738 0.323

[Top 3 out of 7 results showed]

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.499 (vs. 32.260, 8.56%↓) 29.673 0.880
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.145 (vs. 32.837, 8.20%↓) 30.613 1.036
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 49.997 (vs. 53.585, 6.70%↓) 50.484 1.594

[Top 3 out of 15 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

@ScottTodd ScottTodd merged commit a3b7e12 into iree-org:main May 10, 2024
64 of 65 checks passed
bjacob added a commit that referenced this pull request May 14, 2024
This allows dropping our existing local-revert of
llvm/llvm-project#89131 and cherry-pick of
llvm/llvm-project#91654 which we had introduced
in the earlier integrate #17330.

This locally reverts llvm/llvm-project#90802
because it causes numerical errors, reported at
llvm/llvm-project#90802 (comment).
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Jun 5, 2024
…800a3 (iree-org#17330)

* torch-mlir integrated at bce800a.
* llvm-project integrated at 2083e97e plus local changes:
* Reverted llvm/llvm-project#89131 locally:
while this change is good in its own right, the `vector.interleave` that
it generates (instead of `vector.shuffle`) are not handled by some GPU
codegen lowerings.
        * Filed iree-org#17346.
* Cherry-picked Bazel build fix:
llvm/llvm-project#91654
* Several e2e tests have been temporarily disabled, follow-up work is
needed to reenable them: iree-org#17344

---------

Co-authored-by: MaheshRavishankar <[email protected]>
Co-authored-by: Scott Todd <[email protected]>
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Jun 5, 2024
This allows dropping our existing local-revert of
llvm/llvm-project#89131 and cherry-pick of
llvm/llvm-project#91654 which we had introduced
in the earlier integrate iree-org#17330.

This locally reverts llvm/llvm-project#90802
because it causes numerical errors, reported at
llvm/llvm-project#90802 (comment).
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
…800a3 (iree-org#17330)

* torch-mlir integrated at bce800a.
* llvm-project integrated at 2083e97e plus local changes:
* Reverted llvm/llvm-project#89131 locally:
while this change is good in its own right, the `vector.interleave` that
it generates (instead of `vector.shuffle`) are not handled by some GPU
codegen lowerings.
        * Filed iree-org#17346.
* Cherry-picked Bazel build fix:
llvm/llvm-project#91654
* Several e2e tests have been temporarily disabled, follow-up work is
needed to reenable them: iree-org#17344

---------

Co-authored-by: MaheshRavishankar <[email protected]>
Co-authored-by: Scott Todd <[email protected]>
Signed-off-by: Lubo Litchev <[email protected]>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This allows dropping our existing local-revert of
llvm/llvm-project#89131 and cherry-pick of
llvm/llvm-project#91654 which we had introduced
in the earlier integrate iree-org#17330.

This locally reverts llvm/llvm-project#90802
because it causes numerical errors, reported at
llvm/llvm-project#90802 (comment).

Signed-off-by: Lubo Litchev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants