Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky fails on SYCL :: AtomicRef/atomic_memory_order_acq_rel.cpp #8847

Closed
aelovikov-intel opened this issue Mar 28, 2023 · 6 comments
Closed
Assignees
Labels
bug Something isn't working confirmed

Comments

@aelovikov-intel
Copy link
Contributor

It seems the issues has been seen at least twice recently:

https://github.com/intel/llvm/actions/runs/4547459972/jobs/8018077332 (post-commit run for 96acbff). Failed on GEN9 L0 like this:

$ "env" "ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu" "/__w/llvm/llvm/build-e2e/AtomicRef/Output/atomic_memory_order_acq_rel.cpp.tmp.out"
# command output:
Testing acquire
Testing release

# command stderr:
atomic_memory_order_acq_rel.cpp.tmp.out: /__w/llvm/llvm/llvm/sycl/test-e2e/AtomicRef/atomic_memory_order_acq_rel.cpp:143: void test_release_global() [order = sycl::memory_order::release]: Assertion `error == 0' failed.

error: command failed with exit status: -6

As one of the pre-commit CI runs in #8510 (see discussion in comments there)

@aelovikov-intel aelovikov-intel added the bug Something isn't working label Mar 28, 2023
@bader
Copy link
Contributor

bader commented Mar 30, 2023

@aelovikov-intel, I see that it fails quite stably in pre- and post- commits. Please, disable or fix the test as soon as possible.

@jandres742
Copy link
Contributor

this test is flaky here #8732

@bader
Copy link
Contributor

bader commented Mar 31, 2023

@andylshort, I think this issue might be caused by your PR. #8825
The way test is built assumes that we built for CUDA only. Right?

// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -O3 -o %t.out -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_70

@bader
Copy link
Contributor

bader commented Mar 31, 2023

I also noticed that test doesn't check atomic_memory_order_capabilities of tested devices.
Maybe devices in GitHub CI do not support acq_rel property.

bader added a commit that referenced this issue Mar 31, 2023
@andylshort
Copy link

@andylshort, I think this issue might be caused by your PR. #8825 The way test is built assumes that we built for CUDA only. Right?

// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -O3 -o %t.out -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_70

Yes but they're unused in the failing GEN9 test runs so I think they're okay to keep, but could be written better to avoid the clang++: warning: argument unused during compilation: '-Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_70' [-Wunused-command-line-argument].

I also noticed that test doesn't check atomic_memory_order_capabilities of tested devices. Maybe devices in GitHub CI do not support acq_rel property.

I also think they might not property support ace_rel. My earlier patch should have implemented the PI call for atomic_memory_order_capabilities so the devices are actually being queried for this information now. I'm currently working on another bug at the moment that utilises memory orders and atomic_ref in a similar way to this failing test and that too fails on L0 and OCL GPU backends.

I believe both are caused by the same issue, and it's likely a lower-level issue with SPIRV or the GPU driver.

@maarquitos14
Copy link
Contributor

Fixed by #9111.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed
Projects
None yet
Development

No branches or pull requests

5 participants