Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme: Support for reductions with GPU broadcasting #2455

Open
jgreener64 opened this issue Jul 31, 2024 · 5 comments
Open

Enzyme: Support for reductions with GPU broadcasting #2455

jgreener64 opened this issue Jul 31, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request extensions Stuff about package extensions.

Comments

@jgreener64
Copy link

Describe the bug

Reductions with GPU broadcasting error with Enzyme. @wsmoses suggested I open an issue here.

To reproduce

The Minimal Working Example (MWE) for this bug:

using Enzyme, CUDA
f(x, y) = sum(x .+ y)
x = CuArray(rand(5))
y = CuArray(rand(5))
dx = CuArray([1.0, 0.0, 0.0, 0.0, 0.0])
autodiff(Reverse, f, Active, Duplicated(x, dx), Const(y))
ERROR: Enzyme execution failed.
Enzyme compilation failed.

No create nofree of empty function (jl_gc_safe_enter) jl_gc_safe_enter)
 at context:   call fastcc void @julia__launch_configuration_979_4373([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %7, i64 noundef signext 0, { i64, {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(32) %45) #715, !dbg !1090 (julia__launch_configuration_979_4373)

Stacktrace:
 [1] launch_configuration
   @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56
 [2] #launch_heuristic#1204
   @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22
 [3] launch_heuristic
   @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15
 [4] _copyto!
   @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78
 [5] copyto!
   @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44
 [6] copy
   @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29
 [7] materialize
   @ ./broadcast.jl:903
 [8] f
   @ ./REPL[2]:1


Stacktrace:
  [1] throwerr(cstr::Cstring)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1797
  [2] launch_configuration
    @ ~/.julia/dev/CUDA/lib/cudadrv/occupancy.jl:56 [inlined]
  [3] #launch_heuristic#1204
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:22 [inlined]
  [4] launch_heuristic
    @ ~/.julia/dev/CUDA/src/gpuarrays.jl:15 [inlined]
  [5] _copyto!
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:78 [inlined]
  [6] copyto!
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:44 [inlined]
  [7] copy
    @ ~/.julia/packages/GPUArrays/bbZD0/src/host/broadcast.jl:29 [inlined]
  [8] materialize
    @ ./broadcast.jl:903 [inlined]
  [9] f
    @ ./REPL[2]:1 [inlined]
 [10] diffejulia_f_2820wrap
    @ ./REPL[2]:0
 [11] macro expansion
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6819 [inlined]
 [12] enzyme_call
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6419 [inlined]
 [13] CombinedAdjointThunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:6296 [inlined]
 [14] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:314 [inlined]
 [15] autodiff(::ReverseMode{…}, ::typeof(f), ::Type{…}, ::Duplicated{…}, ::Const{…})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:326
 [16] top-level scope
    @ REPL[6]:1
Some type information was truncated. Use `show(err)` to see complete types.

Forward mode also fails. This is with Julia 1.10.3, Enzyme 0.12.26, GPUCompiler 0.26.7 and CUDA d7077da.

Julia Version 1.10.3
Commit 0b4590a5507 (2024-04-30 10:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 18 default, 0 interactive, 9 GC (on 36 virtual cores)
Environment:
  LD_LIBRARY_PATH = /usr/local/gromacs/lib

Details on CUDA:

UDA runtime 12.5, artifact installation
CUDA driver 12.5
NVIDIA driver 535.183.1, originally for CUDA 12.2

CUDA libraries: 
- CUBLAS: 12.5.3
- CURAND: 10.3.6
- CUFFT: 11.2.3
- CUSOLVER: 11.6.3
- CUSPARSE: 12.5.1
- CUPTI: 2024.2.1 (API 23.0.0)
- NVML: 12.0.0+535.183.1

Julia packages: 
- CUDA: 5.4.3
- CUDA_Driver_jll: 0.9.1+1
- CUDA_Runtime_jll: 0.14.1+0

Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.7

2 devices:
  0: NVIDIA RTX A6000 (sm_86, 46.970 GiB / 47.988 GiB available)
  1: NVIDIA RTX A6000 (sm_86, 4.046 GiB / 47.988 GiB available)
@wsmoses
Copy link
Contributor

wsmoses commented Jul 31, 2024 via email

@maleadt
Copy link
Member

maleadt commented Jul 31, 2024

I don't see how this is a CUDA.jl issue.

@wsmoses
Copy link
Contributor

wsmoses commented Jul 31, 2024

Sorry, I mentioned in the earlier issue in Enzyme.jl -- I recommended Joe open an issue here since I think the resolution is extending the Enzyme Cuda ext with a rule that says the derivative of

function GPUArrays.mapreducedim!(f::F, op::OP, R::AnyCuArray{T},
is [corresponding derivative fn].

@maleadt maleadt added enhancement New feature or request extensions Stuff about package extensions. and removed bug Something isn't working labels Jul 31, 2024
@maleadt maleadt changed the title Reductions with GPU broadcasting error with Enzyme Enzyme: Support for reductions with GPU broadcasting Jul 31, 2024
@maleadt
Copy link
Member

maleadt commented Jul 31, 2024

Fair enough! Hope you don't mind me assigning the issue to you then 🙂

@wsmoses
Copy link
Contributor

wsmoses commented Jul 31, 2024

Oh yeah for sure, kind of assumed that :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request extensions Stuff about package extensions.
Projects
None yet
Development

No branches or pull requests

3 participants