Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AAELF64] Clarify how addends work in MOVZ, MOVK and ADRP. #271

Merged

Conversation

statham-arm
Copy link
Contributor

This brings AAELF64 into line with AAELF32, which already has a similar clarification for the MOVW+MOVT pair. For the instructions which shift their operand left (ADRP, and the shifted MOVZ and MOVK), if the relocation addend is taken from the input value of the immediate field, it is not treated as shifted.

The rationale is that this allows a sequence of related instructions to consistently compute the same value (symbol + small offset), and cooperate to load that value into the target register, one small chunk at a time. For example, this would load mySymbol + 0x123:

mov x0, #0x123 ; R_AARCH64_MOVW_UABS_G0_NC(mySymbol)
movk x0, #0x123, lsl #16 ; R_AARCH64_MOVW_UABS_G1_NC(mySymbol)
movk x0, #0x123, lsl #32 ; R_AARCH64_MOVW_UABS_G2_NC(mySymbol)
movk x0, #0x123, lsl #48 ; R_AARCH64_MOVW_UABS_G3(mySymbol)

The existing text made it unclear whether the addends were shifted or not. If they are interpreted as shifted, then nothing useful happens, because the first instruction would load the low 16 bits of mySymbol+0x123, and the second would load the next 16 bits of mySymbol+0x1230000, and so on. This doesn't reliably get you any useful offset from the symbol, because the relocations are processed independently, so that a carry out of the low 16 bits won't be taken into account in the next 16.

If you do need to compute a large offset from the symbol, you have no option but to use SHT_RELA and specify a full 64-bit addend: there's no way to represent that in an SHT_REL setup. But interpreting the SHT_REL addends in the way specified here, you can at least specify small addends successfully.

Copy link
Contributor

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment about the MOV alias but that isn't a strong opinion.

As I understand it the sequence

ADRP x0, symbol + addend
ADD x0, x0, :lo12: symbol +addend  

Would likely limit the size of the addend to ADRP to same addend [0-4095] that could be expressed in the ADD/LDR.

I would like to think there's a way of relating addend1 and addend2 so that we always get the right offset within the page.

ADRP x0, symbol + addend1
ADD x0, x0, :lo12: symbol +addend2

Off the top of my head, something like addend2 is congruent to addend1 (modulo 4096). Would need to think about that in more detail as I could easily have that wrong.

If there is such an expression we could add that as a note.

@@ -940,6 +940,13 @@ A ``RELA`` format relocation must be used if the initial addend cannot be encode

There is no PC bias to accommodate in the relocation of a place containing an instruction that formulates a PC- relative address. The program counter reflects the address of the currently executing instruction.

There are two special cases for forming the initial addend of REL-type relocations where the immediate field cannot normally hold small signed integers:

* For relocations processing MOVZ and MOVK instructions, the initial addend is formed by interpreting the 16-bit literal field of the instruction as a 16-bit signed value in the range -32768 <= A < 32768. The interpretation is the same whether or not the instruction applies a left shift to its immediate: the addend is never treated as shifted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth including MOV (wide immediate) using the Arm ARM description. For example
For relocations processing MOV (wide immediate), MOVZ and MOVK instructions,

MOV (wide immediate) is an alias for MOVZ so this is included in MOVZ and MOVK, but MOV is used a lot in code examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dithered about that myself. Added a parenthesis.

@statham-arm
Copy link
Contributor Author

Nice thought! I think you're absolutely right: you can make ADRP+ADD consistently generate up to a 19-bit signed offset even though the ADD immediate can only hold 12 bits of it, by just setting the ADD immediate to the low 12 bits. The error in computing the full value during processing of the ADD relocation doesn't matter, because it's only in bits 12 and above, which don't affect the output.

This brings AAELF64 into line with AAELF32, which already has a
similar clarification for the MOVW+MOVT pair. For the instructions
which shift their operand left (ADRP, and the shifted MOVZ and MOVK),
if the relocation addend is taken from the input value of the
immediate field, it is not treated as shifted.

The rationale is that this allows a sequence of related instructions
to consistently compute the same value (symbol + small offset), and
cooperate to load that value into the target register, one small chunk
at a time. For example, this would load `mySymbol + 0x123`:

  mov  x0, #0x123          ; R_AARCH64_MOVW_UABS_G0_NC(mySymbol)
  movk x0, #0x123, lsl ARM-software#16 ; R_AARCH64_MOVW_UABS_G1_NC(mySymbol)
  movk x0, #0x123, lsl ARM-software#32 ; R_AARCH64_MOVW_UABS_G2_NC(mySymbol)
  movk x0, #0x123, lsl ARM-software#48 ; R_AARCH64_MOVW_UABS_G3(mySymbol)

The existing text made it unclear whether the addends were shifted or
not. If they are interpreted as shifted, then nothing useful happens,
because the first instruction would load the low 16 bits of
`mySymbol+0x123`, and the second would load the next 16 bits of
`mySymbol+0x1230000`, and so on. This doesn't reliably get you _any_
useful offset from the symbol, because the relocations are processed
independently, so that a carry out of the low 16 bits won't be taken
into account in the next 16.

If you do need to compute a large offset from the symbol, you have no
option but to use SHT_RELA and specify a full 64-bit addend: there's
no way to represent that in an SHT_REL setup. But interpreting the
SHT_REL addends in the way specified here, you can at least specify
_small_ addends successfully.
@statham-arm
Copy link
Contributor Author

Oops! The ADRP immediate is 21 bits, not 19. Almost didn't notice that the encoding has two more immediate bits near the top of the word.

@rearnsha
Copy link

rearnsha commented Jul 2, 2024 via email

@Wilco1
Copy link
Contributor

Wilco1 commented Jul 2, 2024

So the question is, could existing code be using these fields in a
different way (interpreting the bits as scaled by the field encoding
information)? If they could, then this change would break existing usage
and thus cannot be safely made, even if it would have been more useful in
retrospect.

As explained, it is not possible to use a non-zero offset with REL MOVZ/MOVK/ADRP/ADD/LDR relocations since they wouldn't compute the same relocation.

@smithp35
Copy link
Contributor

smithp35 commented Jul 2, 2024

So the question is, could existing code be using these fields in a
different way (interpreting the bits as scaled by the field encoding
information)? If they could, then this change would break existing usage
and thus cannot be safely made, even if it would have been more useful in
retrospect.

While we can never be 100% certain, we've only got one known user of SHT_REL for AArch64 which is the Arm legacy assembler armasm, and that switches to RELA for the instructions with scaled addends like ADRP, MOVT/MOVW. Part of the motivation for this change is to correct an incorrect implementation in LLD.

On the use case side, we can't be 100% certain, but I certainly can't think of a way a compiler could make use of the scaled immediates as relocation addends. I can remember from the 32-bit case that we first implemented the MOVT/MOVW in SHT_REL using scaled addends and soon ran into runtime failures which provoked the redefinition to not scale the addends. It may be possible to come up with something in assembly that isn't contrived.

I think this is an area where I'd be comfortable taking the risk.

Copy link
Contributor

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I'm approving this from my side. I suggest we leave this open for a few days for others to comment/object. Please can Request changes be used for any blocking feedback.

@rearnsha
Copy link

rearnsha commented Jul 2, 2024 via email

@smithp35
Copy link
Contributor

smithp35 commented Jul 2, 2024

It would have to be an expression of the form
symbol + <immediate>* 2 ^ 16
Where the symbol is aligned on a 2 ^ 16 boundary. The only thing that I can think of in that case is where symbol is the section symbol of a section aligned to at least 2 ^ 16, and the section is larger than 2 ^ 16 bytes.

Given how rare that this use-case is compared to sections that are 4/8 byte aligned I doubt that a potential SHT_REL only toolchain would implement section symbol + offset like that. I'd expect it to use local anonymous symbols as anchors.

@smithp35
Copy link
Contributor

smithp35 commented Jul 8, 2024

Will be merging this PR by end of UK working day 08/07/2024

@smithp35
Copy link
Contributor

smithp35 commented Jul 8, 2024

End of UK working day. Merging this PR.

@smithp35 smithp35 merged commit 1a448ed into ARM-software:main Jul 8, 2024
1 check passed
Copy link

@sallyarmneale sallyarmneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes required.

@statham-arm statham-arm deleted the aaelf64-reloc-clarification branch August 12, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants