Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen] Add support for global named registers #741

Open
wants to merge 1,671 commits into
base: main
Choose a base branch
from

Conversation

566hub
Copy link
Contributor

@566hub 566hub commented Jul 15, 2024

Realize RG and GET_RG op which are copied from cir.Global
op. The concept is that the same asm label means a unique abstract memory
address with u64i type, and get_rg will return ptr. We could use
cir.cast(bitcast) to transform ptr to ptr.

bcardosolopes and others added 30 commits June 20, 2024 14:52
…lvm#501)

This PR adds the LLVMIR lowering support for CIR bit operations.

For `cir.bit.clz`, `cir.bit.ctz`, and `cir.bit.popcount`, they can be
lowered directly to LLVM intrinsic calls to `@llvm.ctlz`, `@llvm.cttz`,
and `@llvm.ctpop`, respectively.

For the other three bit operations, namely `cir.bit.clrsb`,
`cir.bit.ffs`, and `cir.bit.parity`, they are lowered to a sequence of
LLVM IR instructions that implements their functionalities. This
lowering scheme is also used by the original clang CodeGen.
CIRGenFunction::buildFromMemory can handle the `cir.bool` values. So we
no longer need to emit the `NIY` error here.
This is the next step in inline assembly support and it's more like a
service PR and mostly dedicated to the in/out argument types.

Also, operand attributes are added and it's the last change in the
`cir.asm` operation afaik. But I would wait untill the next PR,
which will contain more examples and maybe will help us to get more
readable format for the operation.
Note, that we have to add an attribute for each operand - because the
lowering of the llvm dialect to LLVM IR iterates over them in the same
order.

The next PR will be last one (so far) in the series of PRs dedicated to
the inline assembly support. It will add storing of the results.
Three small changes, all cleanup or refactoring in nature.

1. Fix the assemblyFormat for all the vector operations in the ClangIR
dialect so that vector types appear in ClangIR as `!cir.vector<type x
n>` instead of as just `<type x n>`. When I first created the vector
ops, I forgot to use `qualified` as necessary when writing out types.
This change fixes that. There is no change in behavior, but there is a
change to the text version of ClangIR, which required changing the
ClangIR expected results and ClangIR inputs in the tests.

2. Create a new `cir.vec.splat` operation and use that for "vector
splat", i.e. a conversion from a scalar to a vector. A "vector splat"
conversion had been implemented with `cir.vec.create` before. This
change results in different ClangIR and different LLVM IR, which again
required updating the tests, but no noticeable change in compiler
behavior.

3. Create an `IntegerVector` type constraint, which requires that the
given type be a vector whose element type is an integer. It can be any
integral type, and the vector can be of any size. Use the new type
constraint in the definition of `cir.vec.ternary`, whose condition
operand must be an `IntegerVector`. Remove the integral type check from
`VecTernaryOp::verify`, since doing the check there is now redundant.
The only possibly visible change is to the text of an error message when
validation of `cir.vec.ternary` fails. The expected output of a
validation test was updated with the new message.
This PR intends to fix some problems with packed structures support, so
the llvm#473 will work.

Basically, the main problem for the packed structures support is an
absence of arbitrary sized integers in CIR. Well, one workaround is to
use `mlir::IntegerType` for that, but it's kind of wrong way (please
correct me if I'm wrong). Another way is to introduce this type in CIR.
So far I suggest this way: instead of arbitrary sized integers we will
create an array of bytes for bitfield storages whenever they doesn't fit
into the CIR `IntType`.

Well, the original codegen creates storages with alignment 8 - so it can
be `i24` storage type for instance. Previously, we just created storages
that could be represented as CIR `IntType`: 8, 16, 32, 64. And it was
working before I came up with a necessity to support packed structures.
At first glance it's not a problem - just add `determinePacked` method
from the original codegen and that's it. But it turned out that this
method _infers_ the fact if a structure is packed or not. It doesn't use
the AST attribute for that as one could think - it works with offsets
and alignments of fields. Thus, we either need to invent our own way to
determine packed structures (which is error prone and maybe not doable
at all) or try to use the existing one. Also, we go closer to the
original lllvm's data layout in this case.

1) I had to move the lowering details from the `LoweringPrepare` to the
`LowerToLLVM`, because it's not possible to do a `load` from the array
of bytes to the integer type - and it's ok in llvm dialect. Thus, all
the math operations can be expressed without any problems. Basically the
most of the diff you see is because of the changes in the lowering. The
remaining part is more or less easy to read.
2) There are minor changes in `CIRRecordLayoutBuilder` - as described
above, we use may generate an array of bytes as a storage.
3) Some cosmetic changes in `CIRGenExpr` - since we don't want to infer
the storage type again and just use the one stored in the
`CIRGenBitFieldInfo`.
4) Helpers are introduced in the lowering - but nothing hard - just
shifts and logical ops.
5) I removed `bitfield-ops` test - because now the test cases covered
there are all in `bitfields.c` and `bitfields.cpp` .

So ... This is still a suggestion, though I believe it's a good one. So
you are welcome to discuss, suggest another ways to solve the problem
and etc.
Still missing lowering support, which will come next.
This doesn't change existing functionality, for existing crashes related to
atomics we just hit asserts a bit further now, but no support added just yet.
Next set of commits will introduce functionlity with testcases.
Just like previous commit, add more infra pieces, still NFC since all relevant
testcases hit asserts, just a bit deeper.
This PR adds a support for packed structures.

Basically, now both `pragma pack(...)` and
`__attribute__((aligned(...)))` should work.
The only problem is that `getAlignment` is not a total one - I fix only
a couple of issues I faced with - for struct types and arrays.
PR adds support for initialization of unions. The change is copy-pasted
from the original CodeGen.
This PR adds support for `__builtin_prefetch`. CIRGen of this builtin
emits the new 'cir.prefetch' opcode. Then `cir.prefetch` lowers to
`llvm.prefetch` intrinsic.

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
This PR adds MemRead/MemWrite markers to the `GetBitfieldOp` and
`SetBitfieldOp` (as discussed in llvm#487)
Also, minor renaming in the `SetBitfieldOp`

---------

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
This PR adds support for `__builtin_constant_p`.
Implementation introduces the new `cr.is_constant` opcode to it during
the codegeneration of builtin.
Codegeneration is taken from the original llvm codegen.
This patch adds CIRGen support for the C++20 three-way comparison
operator `<=>`. The binary operator is directly lowered to existing CIR
operations.

Most of the changes are tests.
This change is taken from the original codegen.
…m#517)

This PR adds handling of AttributedStmt to support fallthrough
attribute.
We start our journey towards `goto` support and this is a first step on
this way.

There are some discussion in llvm#508 and according to the plan we do the
following here:
- a new pass called `cir-flatten-cfg` that is a stub now but later will
be responsible for the regions inlining. The pass works only if
`-emit-flat-cir` is passed in cmd line. Thus, the clang behavior is not
changed here from the user's point of view.
- The pass will be accomplished with `goto` solver later, so we talk
about several passes that are mandatory for the lowering into `llvm`
dialect. There are at least two clients of this pass that will be
affected: `clang` itself and `cir-opt`, so we need a common point for
them: and `populateCIRFlatteningPasses` and `populateCIRToLLVMPasses`
guarantee that `CIR` will be in the correct state for all the clients,
whatever new passes we will add later.
Implement `__builtin_shufflevector` and `__builtin_convertvector` in
ClangIR. This change contributes to the implemention of issue llvm#284.

`__builtin_convertvector` is implemented as a cast. LLVM IR uses the
same instructions for arithmetic conversions of both individual scalars
and entire vectors. So ClangIR does the same. The code for handling
conversions, in both CodeGen and Lowering, is cleaned up to correctly
handle vector types. To simplify the lowering code and avoid `if
(type.isa<VectorType>())` statements everywhere, the utility function
`elementTypeIfVector` was added to `LowerToLLVM.cpp`.

`__builtin_shufflevector` has two forms, only one of which appears to be
documented.

The documented form, which takes a variable-sized list of integer
constants for the indices, is implemented with the new ClangIR operation
`cir.vec.shuffle.ints`. This operation is lowered to the
`llvm.shufflevector` op.

The undocumented form, which gets the indices from a vector operand, is
implemented with the new ClangIR operation `cir.vec.shuffle.vec`. LLVM
IR does not have an instruction for this, so it gets lowered to a long
series of `llvm.extractelement` and `llvm.insertelement` operations.
This introduces CIRGen and LLVM lowering for the first of a bunch
of these atomic operations, incremental work should generelize the
current constructs.
This PR adds support for the following intrinsic functions:
- `__builtin_bswap{16, 32, 64}`
- `_byteswap_{ushort, ulong, uint64}`

This PR adds a new `cir.bswap` operation to represent such an intrinsic
call. CIRGen and LLVMIR lowering for the new operation is included in
this PR.
This is the final commit for issue llvm#284. Vector types other than GNU
vector types will be covered by other yet-to-be-created issues.

Now that GNU vector types (the ones defined via the vector_size
attribute) are implemented, do a final cleanup of the assertions and
other checks related to vector types.

Remove `UnimplementedFeature::cirVectorType()`. Deal with the remaining
calls to that function. When the that is not yet implemented has to do
with Arm SVE vectors, the assert was changed to
`UnimplementedFeature::scalableVectors()` instead. The assertion was
removed in cases where the code correctly handle GNU vector types.

While cleaning up the assertion checks, I noticed that BinOp handling of
vector types wasn't quite complete. Any special handling for integer or
floating-point types wasn't happening when the operands were vector
types. To fix this, split `BinOpInfo::Ty` into two fields, `FullType`
and `CompType`. `FullType` is the type of the operands. `CompType` is
normally the same as `FullType`, but is the element type when `FullType`
is a vector type.
bcardosolopes and others added 18 commits July 8, 2024 12:32
… value (llvm#719)

This commit fixes GlobalOp lowering for floating without initial value.
It implies to be initialized with zeros.
E.g. float f[100]; double d;
ClangIR was failing on
```
__atomic_compare_exchange_n(&a, &old, 42, true, 5, 5);
```
The `true` was the problem. It would work with a literal `0` or `1`, but
not with a literal `true` or `false`.

The bug was in `isCstWeak` in CIRGenAtomic.cpp, which was only looking
for an integral constant. It didn't recognize a boolean constant and was
falling back on the non-constant path, which isn't implemented yet.

Rewrite `isCstWeak` to check for both intergral and boolean constants.
This PR adds LLVM lowering for the following operations related to
complex numbers:

- `cir.complex.create`,
- `cir.complex.real_ptr`, and
- `cir.complex.imag_ptr`.

The LLVM IR generated for `cir.complex.create` is a bit ugly since it
includes the `insertvalue` instruction, which typically is not generated
in upstream CodeGen. Later we may need further CIR canonicalization
passes to try folding `cir.complex.create`.
This PR fixes the bug described as in
llvm#727 (comment). It
should resolve the crash reported in llvm#727 .
This PR introduces a new attribute `OpenCLKernelMetadataAttr` to model
the OpenCL kernel metadata structurally in CIR, with its corresponding
implementations of CodeGen, Lowering and Translation.

The `"TypeAttr":$vec_type_hint` part is tricky because of the absence of
the signless feature of LLVM IR, while SPIR-V requires it. According to
the spec, the final LLVM IR should encode signedness with an extra `i32`
boolean value.

In this PR, the droping logic from CIR's `TypeConverter` is still used
to avoid code duplication when lowering to LLVM dialect. However, the
signedness is then restored (still capsuled by a CIR attribute) and
dropped again in the translation into LLVM IR.
… BinOp (llvm#720)

This commit extends the pass to support loop invariant BinOp hoisting as
SCF forOp boundary.

E.g.
    // (100 - 1) should be hoisted out of loop.
    // So the boundary could be input operand to generate SCF forOp.
    for (int i = 0; i < 100 - 1; ++i) {}
…#729)

Previously, when lowering induction variable in forOp, we removed the IV
load and replaced the users with SCF.IV.

The CIR IV users might still CIR operations during lowering forOp. It
caused the issue that CIR operation contained SCF.IV as operand which is
MLIR integer type instead CIR type.

This comment lower
    CIR load IV_ADDR
with
    ARITH addi SCF.IV, 0

So SCF.IV can be propagated by OpAdaptor when lowering individual IV
users. This simplifies the lowering and fixes the issue. The redundant
arith.addi can be removed by later MLIR passes.
LLVM lowering for the following operations is introduced in llvm#616 and
llvm#651: `cos`, `exp`, `exp2`, `log`, `log10`, `log2`, `sin`, `sqrt`,
`fmod`, and `pow`. However, they are not lowered to their corresponding
LLVM intrinsics; instead they are transformed to libc calls during
lowering prepare. This does not match the upstream behavior.

This PR tries to correct this mistake. It makes all CIR FP intrinsic ops
lower to their corresponding LLVM intrinsics (`fmod` is a special case
and it is lowered to the `frem` LLVM instruction).
For now only handle the cir.try part, cir.catch is coming next. Using flat cir
for tests make this easy to incrementally build.
as title. document will be in another PR as it seems to be a different
upstream branch
…lvm#733)

In [this
commit](llvm@e5d840b),
minimal support for Darwin aarch64 triples was added. But
TargetLoweringInfo was not updated correspondingly.

This could lead to a failure of the test `driver.c` with
CallConvLowering pass enabled (or `LowerModule` used in some other
ways).

This PR fixes the inconsistency and adds an extra missing feature flag
for it.
Although currently LowerModule is not ready for formal usage, we need it
for target-specific lowering to LLVM. This PR temporarily public the
symbol `createLowerModule` to reuse the logic of preparing a
`LowerModule`, making it easier for future refactor (making
`TargetLoweringInfo` available for most stages in CIR Lowering).
In this PR, we 
1. implement defaultVisibility as far as dsolocal is concerned,
currently is either MLIR::Visibility isPublic() or isPrivate(). Now, we
don't handle hiddenVisibility and protectedVisibility from AST. I put
missFeature assert so that If in anyway we translate hiddenVisibility or
protectedVisibility into mlir::SymbolTable::Visibility::Private
(hopefully not for it'd be confusing), then we need to revise this
defaultVisibility setting.
2. call setNonAliasAttributes on global op upon discovery of its
initialization, thus we have globals dso_local correctly set.

Still missing is lots of function should have dso_local set, but the all
depend on comDat implementation, which will come from next PR within the
next few days.
OG codegen does not generate any exception related content when there are not
calls happening inside the try block. For now we mimic OG and do the same,
until we see a concrete use case that would have used emitting this code.
@566hub 566hub changed the title Realize RG and GET_RG op [CIR]Realize RG and GET_RG op Jul 15, 2024
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, it's a nice feature addition!

We can achieve the same by re-using existing ClangIR components. The suggested approach:

  • Create/use a regular global.
  • Use regular cir.get_global to retrieve the address.
  • Add a new attribute #cir.asm_reg that holds a StringAttr. This should be used to initialize the global.
  • Add a isGlobalRegister method to cir.global's extraClassDeclaration, which checks if the global has an initializer and whether that is a #cir.asm_reg.
  • Add support in LowerToLLVM.cpp to materialize cir.get_globals into metadata !<whatever>. cir.{load,store} referring to these globals should be converted to the proper intrinsic calls.
  • Use a more complete usecase that exercise both intrinsics: https://godbolt.org/z/deGs17rox
  • Use clang/test/CIR/CodeGen/abstract-cond.c as an example for testing both CIR and LLVM output in the same test.

// cir.func no_proto @main() extra(#fn_attr) {
// %0 = cir.const #cir.int<1> : !s32i
// %1 = cir.cast(integral, %0 : !s32i), !s64i
// %2 = cir.get_RG @llvm.named.register.x20 : !cir.ptr<!u64i>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to improve this! We don't eagerly lower things to LLVM right away though, this is done in LowerToLLVM.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.