Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen][Builtin][Neon] Lower neon_vabs_v and neon_vabsq_v #1081

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

ghehg
Copy link
Collaborator

@ghehg ghehg commented Nov 7, 2024

Now implement the same as OG, which is to call llvm aarch64 intrinsic which would eventually become an ARM64 instruction.
However, clearly there is an alternative, which is to extend CIR::AbsOp and CIR::FAbsOp to support vector type and only lower it at LLVM Lowering stage to either LLVM::FAbsOP or [LLVM::AbsOP ], provided LLVM dialect could do the right thing of TargetLowering by translating to llvm aarch64 intrinsic eventually.

The question is whether it is worth doing it?

Any way, put up this diff for suggestions and ideas.

@bcardosolopes
Copy link
Member

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter understand). As the person working on this for some time, what's your take? When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 10, 2024

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter
understand). As the person working on this for some time, what's your take?

I'm thinking about the same thing (just using CIR:AbsOp and CIR:FAbsOp and only lower to intrinsics later)

When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get
llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

Unfortunately, just did experiment. using [LLVM::AbsOP ], gives use LLVM IR like this
%3 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %0, i1 false),
not really using neon-specific intrinsics even for triplet "aarch64-none-linux-android24".

But it might be OK, we could just do our own smart TargetLowering at LoweringPrepare and lower vector ty CIR::AbsOp and CIR::FAbsOp to CIR::LLVMIntrinsicCallOp if target supports neon. so we can take advantage of hardware neon features.

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 10, 2024

The question is whether it is worth doing it?

It's always better to unify and/or make target specific things to be mapped as generic (easier to optimizers to latter
understand). As the person working on this for some time, what's your take?

I'm thinking about the same thing (just using CIR:AbsOp and CIR:FAbsOp and only lower to intrinsics later)

When using C++ source that leads to CIR::AbsOp and CIR::FAbsOp with vectors against OG, does LLVM output get
llvm.aarch64.neon.abs.v4i16 and llvm.aarch64.neon.abs.v8i16 if the vector size matches?

Unfortunately, just did experiment. using [LLVM::AbsOP ], gives use LLVM IR like this %3 = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %0, i1 false), not really using neon-specific intrinsics even for triplet "aarch64-none-linux-android24".

But it might be OK, we could just do our own smart TargetLowering at LoweringPrepare and lower vector ty CIR::AbsOp and CIR::FAbsOp to CIR::LLVMIntrinsicCallOp if target supports neon. so we can take advantage of hardware neon features.

Very interesting.
[Traditional clang codegen seems only want to use neon abs in the case of neon abs builtin]
(https://godbolt.org/z/v8Ysxdrd1)
But,
same exact hardware instruction abs v0.4s, v0.4s is generated for both general and neon intrinsics

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 11, 2024

My current plan is to extend AbsOp to take vector type. PR
Next, extend FAbsOp along with other FpUnaryOps to support vector type.
Then we visit this PR to decide whether we just use AbsOp and FAbsOp and lower them to generic llvm.abs/llvm.fabs which would give us a chance to optimize them in LLVM passes. Or either at code gen or lowering statge to lower them to llvm.aarch64.neon.abs, which would not be optimized away in generated assembly/machine code under -O0.

@bcardosolopes
Copy link
Member

But, same exact hardware instruction abs v0.4s, v0.4s is generated for both general and neon intrinsics

Nice - as long as the final ASM is the same feels like we're good to map all those things with the same higher level CIR operation.

Then we visit this PR to decide whether we just use AbsOp and FAbsOp

Sounds good, thanks

@ghehg
Copy link
Collaborator Author

ghehg commented Nov 22, 2024

back to draft, waiting for other PRs to land.

@ghehg ghehg reopened this Nov 25, 2024
@ghehg ghehg marked this pull request as ready for review November 25, 2024 20:53
@bcardosolopes bcardosolopes merged commit 41078e9 into llvm:main Nov 25, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants