-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Add builtins & codegen support for bitop3_b{16|32} of gfx950. #117823
Merged
+305
−2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Nov 27, 2024
Merged
This was referenced Nov 27, 2024
arsenm
requested review from
jayfoad,
pravinjagtap,
rampitec,
shiltian,
Sisyph and
srpande
November 27, 2024 00:55
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesCo-authored-by: Pravin Jagtap <[email protected]> Patch is 25.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117823.diff 11 Files Affected:
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index 54bbec97b17702..7ed488d3a7e515 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -614,6 +614,8 @@ TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk32_bf6_f32, "V6UiV32fUif", "nc
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk32_fp6_bf16, "V6UiV32yUif", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts")
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk32_fp6_f16, "V6UiV32hUif", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts")
TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_sr_pk32_fp6_f32, "V6UiV32fUif", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts")
+TARGET_BUILTIN(__builtin_amdgcn_bitop3_b32, "iiiiIUc", "nc", "bitop3-insts")
+TARGET_BUILTIN(__builtin_amdgcn_bitop3_b16, "ssssIUc", "nc", "bitop3-insts")
#undef BUILTIN
#undef TARGET_BUILTIN
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f32d5a2f43559a..e0504c0e38b22a 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -748,6 +748,18 @@ static Value *emitBuiltinWithOneOverloadedType(CodeGenFunction &CGF,
return CGF.Builder.CreateCall(F, Args, Name);
}
+// Emit an intrinsic that has 4 operands of the same type as its result.
+static Value *emitQuaternaryBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+ unsigned IntrinsicID) {
+ llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
+ llvm::Value *Src1 = CGF.EmitScalarExpr(E->getArg(1));
+ llvm::Value *Src2 = CGF.EmitScalarExpr(E->getArg(2));
+ llvm::Value *Src3 = CGF.EmitScalarExpr(E->getArg(3));
+
+ Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+ return CGF.Builder.CreateCall(F, {Src0, Src1, Src2, Src3});
+}
+
// Emit an intrinsic that has 1 float or double operand, and 1 integer.
static Value *emitFPIntBuiltin(CodeGenFunction &CGF,
const CallExpr *E,
@@ -20250,6 +20262,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
Builder.CreateInsertElement(Insert0, Elt1, UINT64_C(1));
return AsVector;
}
+ case AMDGPU::BI__builtin_amdgcn_bitop3_b32:
+ case AMDGPU::BI__builtin_amdgcn_bitop3_b16:
+ return emitQuaternaryBuiltin(*this, E, Intrinsic::amdgcn_bitop3);
case AMDGPU::BI__builtin_amdgcn_make_buffer_rsrc:
return emitBuiltinWithOneOverloadedType<4>(
*this, E, Intrinsic::amdgcn_make_buffer_rsrc);
diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl
index d8f3025046d2e0..f629262ac58879 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-features.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl
@@ -89,7 +89,7 @@
// GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
// GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts"
// GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
-// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+bf8-cvt-scale-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp4-cvt-scale-insts,+fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-cvt-scale-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
+// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+bf8-cvt-scale-insts,+bitop3-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp4-cvt-scale-insts,+fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-cvt-scale-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64"
// GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
// GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
// GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32"
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950-err.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950-err.cl
index 144a512fdf6edb..81195e721c13f6 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950-err.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950-err.cl
@@ -11,6 +11,7 @@
// REQUIRES: amdgpu-registered-target
typedef unsigned int uint;
+typedef unsigned short ushort;
typedef unsigned int uint2 __attribute__((ext_vector_type(2)));
typedef half __attribute__((ext_vector_type(2))) half2;
typedef short __attribute__((ext_vector_type(2))) short2;
@@ -21,7 +22,7 @@ typedef unsigned int __attribute__((ext_vector_type(6))) uint6;
typedef half __attribute__((ext_vector_type(32))) half32;
typedef __bf16 __attribute__((ext_vector_type(32))) bfloat32;
-void test(global uint* out, global uint2* out_v2u32, uint a, uint b, global half2* out_v2f16, global float* out_f32, float scale, global short2* out_v2i16, float src0, float src1,
+void test(global uint* out, global uint2* out_v2u32, uint a, uint b, uint c, global half2* out_v2f16, global float* out_f32, float scale, global short2* out_v2i16, float src0, float src1,
float2 src0_v2f32, global float2* out_v2f32, half2 src0_v2f16, bfloat2 src0_v2bf16, global bfloat2* out_v2bf16, global float32* out_v32f32, uint6 src_v6i32,
global half32 *out_v32f16, global bfloat32 *out_v32bf16) {
*out = __builtin_amdgcn_prng_b32(a); // expected-error{{'__builtin_amdgcn_prng_b32' needs target feature prng-inst}}
@@ -58,4 +59,6 @@ void test(global uint* out, global uint2* out_v2u32, uint a, uint b, global half
*out = __builtin_amdgcn_cvt_scalef32_sr_pk_fp4_f16(*out, src0_v2f16, 0, scale, 0); // expected-error{{'__builtin_amdgcn_cvt_scalef32_sr_pk_fp4_f16' needs target feature fp4-cvt-scale-insts}}
*out = __builtin_amdgcn_cvt_scalef32_sr_pk_fp4_bf16(*out, src0_v2bf16, 0, scale, 0); // expected-error{{'__builtin_amdgcn_cvt_scalef32_sr_pk_fp4_bf16' needs target feature fp4-cvt-scale-insts}}
*out = __builtin_amdgcn_cvt_scalef32_sr_pk_fp4_f32(*out, src0_v2f32, 0, scale, 0); // expected-error{{'__builtin_amdgcn_cvt_scalef32_sr_pk_fp4_f32' needs target feature fp4-cvt-scale-insts}}
+ *out = __builtin_amdgcn_bitop3_b32(a, b, c, 1); // expected-error {{'__builtin_amdgcn_bitop3_b32' needs target feature bitop3-insts}}
+ *out = __builtin_amdgcn_bitop3_b16((ushort)a, (ushort)b, (ushort)c, 1); // expected-error {{'__builtin_amdgcn_bitop3_b16' needs target feature bitop3-insts}}
}
diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
index bbfb16e05a53e0..2e1025e895c636 100644
--- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
+++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl
@@ -5,6 +5,7 @@
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
typedef unsigned int uint;
+typedef unsigned short ushort;
typedef unsigned int __attribute__((ext_vector_type(2))) uint2;
typedef unsigned int __attribute__((ext_vector_type(6))) uint6;
typedef __bf16 __attribute__((ext_vector_type(32))) bfloat32;
@@ -1658,3 +1659,47 @@ void test_cvt_scalef32_sr_fp8_f32(global unsigned *out, float src, uint seed, fl
*out = __builtin_amdgcn_cvt_scalef32_sr_fp8_f32(*out, src, seed, scale, 1);
*out = __builtin_amdgcn_cvt_scalef32_sr_fp8_f32(*out, src, seed, scale, 2);
}
+
+// CHECK-LABEL: @test_bitop3_b32(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT: [[C_ADDR:%.*]] = alloca i32, align 4, addrspace(5)
+// CHECK-NEXT: store ptr addrspace(1) [[OUT:%.*]], ptr addrspace(5) [[OUT_ADDR]], align 8
+// CHECK-NEXT: store i32 [[A:%.*]], ptr addrspace(5) [[A_ADDR]], align 4
+// CHECK-NEXT: store i32 [[B:%.*]], ptr addrspace(5) [[B_ADDR]], align 4
+// CHECK-NEXT: store i32 [[C:%.*]], ptr addrspace(5) [[C_ADDR]], align 4
+// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(5) [[A_ADDR]], align 4
+// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr addrspace(5) [[B_ADDR]], align 4
+// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(5) [[C_ADDR]], align 4
+// CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.amdgcn.bitop3.i32(i32 [[TMP0]], i32 [[TMP1]], i32 [[TMP2]], i8 1)
+// CHECK-NEXT: [[TMP4:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[OUT_ADDR]], align 8
+// CHECK-NEXT: store i32 [[TMP3]], ptr addrspace(1) [[TMP4]], align 4
+// CHECK-NEXT: ret void
+//
+void test_bitop3_b32(global uint* out, uint a, uint b, uint c) {
+ *out = __builtin_amdgcn_bitop3_b32(a, b, c, 1);
+}
+
+// CHECK-LABEL: @test_bitop3_b16(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)
+// CHECK-NEXT: [[A_ADDR:%.*]] = alloca i16, align 2, addrspace(5)
+// CHECK-NEXT: [[B_ADDR:%.*]] = alloca i16, align 2, addrspace(5)
+// CHECK-NEXT: [[C_ADDR:%.*]] = alloca i16, align 2, addrspace(5)
+// CHECK-NEXT: store ptr addrspace(1) [[OUT:%.*]], ptr addrspace(5) [[OUT_ADDR]], align 8
+// CHECK-NEXT: store i16 [[A:%.*]], ptr addrspace(5) [[A_ADDR]], align 2
+// CHECK-NEXT: store i16 [[B:%.*]], ptr addrspace(5) [[B_ADDR]], align 2
+// CHECK-NEXT: store i16 [[C:%.*]], ptr addrspace(5) [[C_ADDR]], align 2
+// CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr addrspace(5) [[A_ADDR]], align 2
+// CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr addrspace(5) [[B_ADDR]], align 2
+// CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr addrspace(5) [[C_ADDR]], align 2
+// CHECK-NEXT: [[TMP3:%.*]] = call i16 @llvm.amdgcn.bitop3.i16(i16 [[TMP0]], i16 [[TMP1]], i16 [[TMP2]], i8 1)
+// CHECK-NEXT: [[TMP4:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[OUT_ADDR]], align 8
+// CHECK-NEXT: store i16 [[TMP3]], ptr addrspace(1) [[TMP4]], align 2
+// CHECK-NEXT: ret void
+//
+void test_bitop3_b16(global ushort* out, ushort a, ushort b, ushort c) {
+ *out = __builtin_amdgcn_bitop3_b16(a, b, c, 1);
+}
diff --git a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
index 65078312a6e8a8..c0ea2f961337e1 100644
--- a/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
+++ b/clang/test/SemaOpenCL/builtins-amdgcn-error-gfx950-param.cl
@@ -15,6 +15,7 @@ typedef half half2 __attribute__((ext_vector_type(2)));
typedef short short2 __attribute__((ext_vector_type(2)));
typedef float float2 __attribute__((ext_vector_type(2)));
typedef __bf16 bfloat2 __attribute__((ext_vector_type(2)));
+typedef unsigned short ushort;
void test_mfma_f32_16x16x32_f16(__global float4* out, half8 a, half8 b, float4 c, int X) {
@@ -198,3 +199,8 @@ void test_cvt_scalef32(global half2* out_v2f16, global float* out_f32, uint src,
*out = __builtin_amdgcn_cvt_scalef32_sr_fp8_f16(*out, src0, 0, scale, index); // expected-error{{argument to '__builtin_amdgcn_cvt_scalef32_sr_fp8_f16' must be a constant integer}}
*out = __builtin_amdgcn_cvt_scalef32_sr_fp8_f32(*out, src0, 0, scale, index); // expected-error{{argument to '__builtin_amdgcn_cvt_scalef32_sr_fp8_f32' must be a constant integer}}
}
+
+void test_bitop3_args(global uint* out, uint a, uint b, uint c) {
+ *out = __builtin_amdgcn_bitop3_b32(a, b, c, a); // expected-error {{argument to '__builtin_amdgcn_bitop3_b32' must be a constant integer}}
+ *out = __builtin_amdgcn_bitop3_b16((ushort)a, (ushort)b, (ushort)c, a); // expected-error {{argument to '__builtin_amdgcn_bitop3_b16' must be a constant integer}}
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index 1561942c81c67d..eb3167506faec9 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -760,6 +760,11 @@ def int_amdgcn_prng_b32 : DefaultAttrsIntrinsic<
[llvm_i32_ty], [llvm_i32_ty], [IntrNoMem]
>, ClangBuiltin<"__builtin_amdgcn_prng_b32">;
+def int_amdgcn_bitop3 :
+ DefaultAttrsIntrinsic<[llvm_anyint_ty],
+ [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>, llvm_i8_ty],
+ [IntrNoMem, IntrSpeculatable, ImmArg<ArgIndex<3>>]>;
+
} // TargetPrefix = "amdgcn"
// New-style image intrinsics
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index 8d389d2fb9ea9c..ff994aef3951de 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4937,6 +4937,14 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const {
OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
break;
}
+ case Intrinsic::amdgcn_bitop3: {
+ unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
+ OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
+ OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
+ OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
+ OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
+ break;
+ }
case Intrinsic::amdgcn_s_quadmask:
case Intrinsic::amdgcn_s_wqm: {
Register MaskReg = MI.getOperand(2).getReg();
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 9ef52c0feb7233..5a78632a3de055 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -1266,6 +1266,15 @@ let SubtargetPredicate = HasBitOp3Insts in {
defm V_BITOP3_B32 : VOP3Inst <"v_bitop3_b32",
VOP3_BITOP3_Profile<VOPProfile <[i32, i32, i32, i32, i8]>, VOP3_REGULAR>>;
}
+ def : GCNPat<
+ (i32 (int_amdgcn_bitop3 i32:$src0, i32:$src1, i32:$src2, i8:$bitop3)),
+ (i32 (V_BITOP3_B32_e64 VSrc_b32:$src0, VSrc_b32:$src1, VSrc_b32:$src2, timm:$bitop3))
+ >;
+
+ def : GCNPat<
+ (i16 (int_amdgcn_bitop3 i16:$src0, i16:$src1, i16:$src2, i8:$bitop3)),
+ (i16 (V_BITOP3_B16_e64 0, VSrc_b16:$src0, 0, VSrc_b16:$src1, 0, VSrc_b16:$src2, timm:$bitop3, 0))
+ >;
} // End SubtargetPredicate = HasBitOp3Insts
class DivFmasPat<ValueType vt, Instruction inst, Register CondReg> : GCNPat<
diff --git a/llvm/lib/TargetParser/TargetParser.cpp b/llvm/lib/TargetParser/TargetParser.cpp
index 969214b76ee927..ece2df0b13dd33 100644
--- a/llvm/lib/TargetParser/TargetParser.cpp
+++ b/llvm/lib/TargetParser/TargetParser.cpp
@@ -473,6 +473,7 @@ void AMDGPU::fillAMDGPUFeatureMap(StringRef GPU, const Triple &T,
Features["gws"] = true;
break;
case GK_GFX950:
+ Features["bitop3-insts"] = true;
Features["fp6bf6-cvt-scale-insts"] = true;
Features["fp4-cvt-scale-insts"] = true;
Features["bf8-cvt-scale-insts"] = true;
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.bitop3.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.bitop3.ll
new file mode 100644
index 00000000000000..ff2f4db0d7a5f9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.bitop3.ll
@@ -0,0 +1,209 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -march=amdgcn -mcpu=gfx950 -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX950-SDAG %s
+; RUN: llc -global-isel -march=amdgcn -mcpu=gfx950 -verify-machineinstrs < %s | FileCheck -enable-var-scope -check-prefixes=GCN,GFX950-GISEL %s
+
+declare i32 @llvm.amdgcn.bitop3.i32(i32, i32, i32, i8)
+declare i16 @llvm.amdgcn.bitop3.i16(i16, i16, i16, i8)
+
+define amdgpu_ps float @bitop3_b32_vvv(i32 %a, i32 %b, i32 %c) {
+; GCN-LABEL: bitop3_b32_vvv:
+; GCN: ; %bb.0:
+; GCN-NEXT: v_bitop3_b32 v0, v0, v1, v2 bitop3:0xf
+; GCN-NEXT: ; return to shader part epilog
+ %ret = call i32 @llvm.amdgcn.bitop3.i32(i32 %a, i32 %b, i32 %c, i8 15)
+ %ret_cast = bitcast i32 %ret to float
+ ret float %ret_cast
+}
+
+define amdgpu_ps float @bitop3_b32_svv(i32 inreg %a, i32 %b, i32 %c) {
+; GCN-LABEL: bitop3_b32_svv:
+; GCN: ; %bb.0:
+; GCN-NEXT: v_bitop3_b32 v0, s0, v0, v1 bitop3:0x10
+; GCN-NEXT: ; return to shader part epilog
+ %ret = call i32 @llvm.amdgcn.bitop3.i32(i32 %a, i32 %b, i32 %c, i8 16)
+ %ret_cast = bitcast i32 %ret to float
+ ret float %ret_cast
+}
+
+define amdgpu_ps float @bitop3_b32_ssv(i32 inreg %a, i32 inreg %b, i32 %c) {
+; GCN-LABEL: bitop3_b32_ssv:
+; GCN: ; %bb.0:
+; GCN-NEXT: v_mov_b32_e32 v1, s1
+; GCN-NEXT: v_bitop3_b32 v0, s0, v1, v0 bitop3:0x11
+; GCN-NEXT: ; return to shader part epilog
+ %ret = call i32 @llvm.amdgcn.bitop3.i32(i32 %a, i32 %b, i32 %c, i8 17)
+ %ret_cast = bitcast i32 %ret to float
+ ret float %ret_cast
+}
+
+define amdgpu_ps float @bitop3_b32_sss(i32 inreg %a, i32 inreg %b, i32 inreg %c) {
+; GCN-LABEL: bitop3_b32_sss:
+; GCN: ; %bb.0:
+; GCN-NEXT: v_mov_b32_e32 v0, s1
+; GCN-NEXT: v_mov_b32_e32 v1, s2
+; GCN-NEXT: v_bitop3_b32 v0, s0, v0, v1 bitop3:0x12
+; GCN-NEXT: ; return to shader part epilog
+ %ret = call i32 @llvm.amdgcn.bitop3.i32(i32 %a, i32 %b, i32 %c, i8 18)
+ %ret_cast = bitcast i32 %ret to float
+ ret float %ret_cast
+}
+
+define amdgpu_ps float @bitop3_b32_vvi(i32 %a, i32 %b) {
+; GFX950-SDAG-LABEL: bitop3_b32_vvi:
+; GFX950-SDAG: ; %bb.0:
+; GFX950-SDAG-NEXT: s_movk_i32 s0, 0x3e8
+; GFX950-SDAG-NEXT: v_bitop3_b32 v0, v0, v1, s0 bitop3:0x13
+; GFX950-SDAG-NEXT: ; return to shader part epilog
+;
+; GFX950-GISEL-LABEL: bitop3_b32_vvi:
+; GFX950-GISEL: ; %bb.0:
+; GFX950-GISEL-NEXT: v_mov_b32_e32 v2, 0x3e8
+; GFX950-GISEL-NEXT: v_bitop3_b32 v0, v0, v1, v2 bitop3:0x13
+; GFX950-GISEL-NEXT: ; return to shader part epilog
+ %ret = call i32 @llvm.amdgcn.bitop3.i32(i32 %a, i32 %b, i32 1000, i8 19)
+ %ret_cast = bitcast i32 %ret to float
+ ret float %ret_cast
+}
+
+define amdgpu_ps float @bitop3_b32_vii(i32 %a) {
+; GFX950-SDAG-LABEL: bitop3_b32_vii:
+; GFX950-SDAG: ; %bb.0:
+; GFX950-SDAG-NEXT: s_movk_i32 s0, 0x7d0
+; GFX950-SDAG-NEXT: v_mov_b32_e32 v1, 0x3e8
+; GFX950-SDAG-NEXT: v_bitop3_b32 v0, v0, s0, v1 bitop3:0x14
+...
[truncated]
|
arsenm
force-pushed
the
users/arsenm/gfx950/allocate-different-regs-v_cvt_scalef32
branch
from
November 27, 2024 01:02
418ffe2
to
bbc7178
Compare
arsenm
force-pushed
the
users/arsenm/gfx950/codegen-bitop3
branch
from
November 27, 2024 01:02
4e8c398
to
f76165b
Compare
llvmbot
added
clang
Clang issues not falling into any other category
clang:frontend
Language frontend issues, e.g. anything involving "Sema"
clang:codegen
llvm:ir
labels
Nov 27, 2024
shiltian
approved these changes
Nov 27, 2024
arsenm
force-pushed
the
users/arsenm/gfx950/allocate-different-regs-v_cvt_scalef32
branch
from
November 27, 2024 04:17
bbc7178
to
7ce3533
Compare
arsenm
force-pushed
the
users/arsenm/gfx950/codegen-bitop3
branch
from
November 27, 2024 04:17
f76165b
to
8c57ac5
Compare
This was referenced Nov 27, 2024
arsenm
force-pushed
the
users/arsenm/gfx950/allocate-different-regs-v_cvt_scalef32
branch
from
November 27, 2024 04:24
7ce3533
to
ddb0c55
Compare
Base automatically changed from
users/arsenm/gfx950/allocate-different-regs-v_cvt_scalef32
to
main
November 27, 2024 04:29
Co-authored-by: Pravin Jagtap <[email protected]>
arsenm
force-pushed
the
users/arsenm/gfx950/codegen-bitop3
branch
from
November 27, 2024 04:29
8c57ac5
to
05d7ac2
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backend:AMDGPU
clang:codegen
clang:frontend
Language frontend issues, e.g. anything involving "Sema"
clang
Clang issues not falling into any other category
llvm:ir
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Co-authored-by: Pravin Jagtap [email protected]