-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SME] Add agnostic-ZA interface and routines to save/restore SME state #264
base: main
Are you sure you want to change the base?
Conversation
…ate. This implements requests to add a new "ZA-compatible" interface which can be called with ZA state being either 'off', 'active' or 'dormant', and which can preserve any and all state enabled under PSTATE.ZA.
aapcs64/aapcs64.rst
Outdated
|
||
.. _`private-ZA`: | ||
.. _`shared-ZA`: | ||
.. _`compatible-ZA`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this ZA-agnostic
, for compatibility with the ACLE PR?
aapcs64/aapcs64.rst
Outdated
Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_ | ||
interface can both (at their option) choose to guarantee that they | ||
The compatible-ZA interface is intended to be called from any function | ||
without requiring a change to PSTATE.ZA and is generally used in conjunction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this stronger, and say that the function must preserve all state associated with PSTATE.ZA.
aapcs64/aapcs64.rst
Outdated
with the expectation that it `preserves ZA`_. | ||
|
||
Subroutines with a `private-ZA`_ interface, `shared-ZA`_ interface or | ||
`compatible-ZA`_ interface can (at their option) choose to guarantee that they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…and then leave this paragraph as it was originally.
aapcs64/aapcs64.rst
Outdated
@@ -2069,6 +2077,8 @@ support routines: | |||
|
|||
* the current values of TPIDR2_EL0, PSTATE.SM and PSTATE.ZA. | |||
|
|||
* the registers enabled by PSTATE.ZA, if PSTATE.ZA is enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this change referring to? It doesn't look like the patch changes __arm_sme_state
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this should not have been committed. I've removed the change.
aapcs64/aapcs64.rst
Outdated
The size is guaranteed to be a multiple of 16. | ||
|
||
The layout that corresponds to the calculated size is unspecified, | ||
but the assumption is that the size always matches the implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a requirement rather than an assumption.
aapcs64/aapcs64.rst
Outdated
* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine | ||
stores the full contents of ZT0 to ``PTR->ZT0``. | ||
|
||
* If bit 63 of ``OPTIONS`` is 0, then the subroutine disables PSTATE.ZA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that it isn't compatible-ZA
according to the above table, since ZA is in a different state on return.
Also, bit 31 == 0 && bit 2 == 1 would be an invalid combination, since we would potentially have a nonnull (unsaved) TPIDR2_EL0 while PSTATE.ZA==0.
aapcs64/aapcs64.rst
Outdated
+--------+-----------------------------------------+ | ||
| bits | Options | | ||
+========+=========================================+ | ||
| 63 | Preserve ZA using lazy save mechanism | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth the complication of supporting two ways of saving ZA? Couldn't we start with the lazy version, and add the non-lazy version as a variation later if necessary?
aapcs64/aapcs64.rst
Outdated
| 62 - 3 | Zero for this revision of the AAPCS64, | | ||
| | but reserved for future expansion | | ||
+--------+-----------------------------------------+ | ||
| 2 | Exclude TPIDR2_EL0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I'm not sure it's worth having a separate toggle for this and ZA.
aapcs64/aapcs64.rst
Outdated
|
||
* both bit 0 and bit 63 of ``OPTIONS`` are 1. | ||
|
||
* If PSTATE.ZA is 0, then the subroutine enables PSTATE.ZA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should only do this if PSTATE.ZA was 1 when __arm_sme_save
was called. Similarly for the restorations below.
aapcs64/aapcs64.rst
Outdated
``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by PTR: | ||
|
||
* If bit 63 of ``OPTIONS`` is 1 and TPIDR2_EL0 is null, then the function | ||
copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``. | |
points X0 at ``PTR->BLK`` and calls ``__arm_tpidr2_restore``. |
* Renamed `compatible ZA` -> `agnostic ZA` * Changed __arm_sme_save/restore such that the save routine should record whether ZA or ZT0 is saved and such that the restore routine should check whether ZA or ZT0 was saved. This removes the (previously) implicit assumption in `__arm_sme-save` that PSTATE.ZA must be 1 if PTR is not nullptr. * ZA is now always saved/restored using the lazy-save mechanism. * Changed __arm_sme_save/restore to have a custom ZA interface instead of 'agnostic-ZA' which was incorrect.
efe901f
to
6f1bba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me whether we expect callers of __arm_sme_state_size
to check for zero returns, and skip the save and restore calls for that case. If so (“case A”), I think this makes things unnecessarily complex for the callers. If not (“case B”), the buffer should have nonzero size even if PSTATE.ZA==0, so that it can hold at least the SAVED_ZA
and SAVED_ZT0
fields.
And for case B, the save and restore functions should be callable even in threads that don't have access to SME.
In a similar vein, how about making __arm_sme_restore
decide what to do based only on the SAVED_*
fields, rather than passing OPTIONS
again? The caller of the save and restore functions is expected to be the same, so it should know what needs to be preserved.
I wonder how useful the OPTIONS
will be in practice, given that __arm_sme_save
must leave PSTATE.ZA in a dormant or off state. It seems unlikely that a caller would know enough about its position in the call hierarchy/overall program to know that certain state doesn't need to be preserved, but at the same time not know enough for it to have a specific __arm_in/out/inout/preserves
annotation. No objection to keeping OPTIONS
if you prefer though.
aapcs64/aapcs64.rst
Outdated
as well as any other state required for `__arm_sme_save`_ and | ||
`__arm_sme_restore`_. | ||
|
||
`__arm_sme_state_size`_ assumes that ZA is saved lazily and will account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is potentially confusing, since it isn't clear who is assumed to do the lazy saving. I think what this paragraph is saying comes under the previous sentence — “as well as any other state required for __arm_sme_save
_ and __arm_sme_restore
_” — and so it might be better just to delete this.
aapcs64/aapcs64.rst
Outdated
``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at unspecified offsets | ||
in the buffer pointed to by ``PTR``: | ||
|
||
* The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. | |
* The contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. |
Using “full” sounded odd in reference to a system register that contains a pointer.
aapcs64/aapcs64.rst
Outdated
|
||
* The value 1 is written to ``PTR->SAVED_ZA``. | ||
|
||
* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then for the addresses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that if bit 0 is 1, the function should turn PSTATE.ZA off before returning. In other words, ZA must not be active on return from this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, you're right!
The initial thinking was "case A", i.e. the intrinsics are only called when there is something worth saving. But now that I've added SAVED_ZA and SAVED_ZT0, it probably makes sense to remove that assumption.
My reasons for adding
where only "za" needs preserving across the call, but not "zt0". This could be done using explicit allocas, calling the
The reasoning for that was that one might only want to restore part of the saved state at one point, and restore the other part at another point. The compiler or user writing asm could use these routines to save/restore any state, e.g.
|
I don't think the new routines work for that case though. The routines rely on saving ZA lazily, whereas the above needs to save ZA eagerly. |
Yes, that was the reason I initially added the "lazily" as an option to the struct. I figured it should still be possible to set up the lazy-save using these routines and then commit the lazy-save manually before the call to |
The difference for me is that the new routines are fundamentally about processing the live ZA state (and any future SME state) such that it's safe to call a normal function. In some cases this will involve turning PSTATE.ZA off. This is different from what the mixed-sharing example needs: there the intention is explicitly to keep ZA on (and active). The new routines are also about providing an interface that can handle unknown future state. Thus the assumption is that extra state must be stored unless explicitly turned off (via When the callee shares some ZA state, but shares less state than the caller, the caller has to do an eager save and restore of the remaining state. I think we should just treat this as being conceptually like any other caller save. Admittedly the save and restore are quite heavy operations for ZA, but in concept they're not much different from a register save and restore. Treating them like that also helps with your later example involving calls to separate |
As @rsandifo-arm explained, the same routines cannot be used for general use of saving/restoring of state enabled by PSTATE.ZA because of the different expectations on the ZA interface when partially saving/restoring state enabled by PSTATE.ZA. Removing the option, drastically simplifies the logic. Note that I have removed the two booleans (internal to the save/restore routines) that distinguish between having saved 'ZA' and 'ZT0', and replaced that with a single 'VALID' bit, because I think we can assume that the save/restore routines are called by PEs that have the same SME state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for only raising this now, but I wonder if we should reconsider the case in which ZA is dormant on entry to an agnostic-ZA function. Rather than say that agnostic-ZA functions must preserve the state (and so leave ZA dormant), should we instead say that agnostic-ZA behaves like private-ZA for that case? No other PSTATE.ZA state can be live if ZA is dormant. In particular, if ZA is dormant then ZT0 must be dead.
In other words, agnostic-ZA is like private-ZA except that ZA can be active on entry. If ZA is active on entry, it must be unchanged on return.
(If in future there is any other SME state that needs to be preserved, and that is not controlled by PSTATE.ZA, then that would still need to be saved and restored.)
The dormant-on-entry case isn't too interesting for direct calls between streaming code and agnostic-ZA functions, since the streaming code wouldn't be expected to set up a lazy save in that case. But perhaps it is more relevant if streaming code calls a normal function and that normal function calls a private-ZA function. (Ideally, the normal function would be changed to private-ZA, but that might not always be possible for ABI reasons.)
Other than that, it looks good to me. Some minor comments below.
aapcs64/aapcs64.rst
Outdated
* For addresses ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at | ||
unspecified offsets in the buffer pointed to by ``PTR``: | ||
|
||
* The subroutine aborts in some platform-specific manner if the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this, since:
If the thread has access to SME then it must also have access to TPIDR2_EL0.
aapcs64/aapcs64.rst
Outdated
|
||
* The current thread does not have access to SME. | ||
|
||
* The current thread does not have access to ``TPIDR2_EL0`` when PSTATE.ZA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this can be deleted.
aapcs64/aapcs64.rst
Outdated
* X1-X15, X19-X29 and SP are call-preserved. | ||
* Z0-Z31 are call-preserved. | ||
* P0-P15 are call-preserved. | ||
* the subroutine `preserves ZA`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to delete this, since “agnostic-ZA” has its own requirements about the state of ZA on return.
aapcs64/aapcs64.rst
Outdated
|
||
* The custom ``ZA`` interface has the following properties: | ||
|
||
* If ZA state is 'off' on entry and SME state needs restoring, then it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
active or dormant, since if ZA was dormant on entry to __arm_sme_save
, and if the lazy save is later committed, we'll end up restoring the previous dormant state. Similarly for the final bullet point.
But I think we move this down immediately before “The subroutine behaves as follows:” and then just say something like:
- The ZA state on normal return is the same as the ZA state on entry to the
__arm_sme_save
call that initializedPTR
.
(Sentence feels a bit clunky though, so improvements definitely welcome.)
aapcs64/aapcs64.rst
Outdated
* The subroutine takes the following arguments: | ||
|
||
PTR | ||
a 64-bit data pointer passed in X0 that points to a buffer which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about instead saying:
- a 64-bit data pointer in X0 that must previously have been passed to
__arm_sme_save
. The buffer that it points to must still contain the data written by that call.
?
aapcs64/aapcs64.rst
Outdated
* The subroutine behaves as follows: | ||
|
||
* If ``PTR`` does not point to a valid buffer with the required size, the | ||
behaviour of calling this routine is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
behavior (and similarly throughout).
This changes the 'dormant on entry' case to be 'no change'.
The specification of these routines can be found here: ARM-software/abi-aa#264
This implements the lowering of calls from agnostic-ZA functions to non-agnostic-ZA functions, using the ABI routines `__arm_sme_state_size`, `__arm_sme_save` and `__arm_sme_restore`. This implements the proposal described in the following PRs: * ARM-software/acle#336 * ARM-software/abi-aa#264
aapcs64/aapcs64.rst
Outdated
* If ZA state is 'active' on entry, then it is 'dormant' on normal return. | ||
Otherwise the ZA state is unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If ZA state is 'active' on entry, then it is 'dormant' on normal return. | |
Otherwise the ZA state is unchanged. | |
* If ZA state is `active <ZA active state>`_ on entry, then it is | |
`dormant <ZA dormant state>`_ on normal return. Otherwise the ZA state | |
is unchanged. |
(untested) with a new internal target for “ZA dormant state”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, these links don't work for me when I generate the PDF (active or dormant), even though I've spelled them correctly.
aapcs64/aapcs64.rst
Outdated
|
||
* For the address ``PTR->VALID`` at an unspecified offset in the buffer, | ||
the value 0 is written to ``PTR->VALID`` and the subroutine returns, if | ||
either of the following conditions is true: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either of the following conditions is true: | |
one of the following conditions is true: |
aapcs64/aapcs64.rst
Outdated
|
||
* The subroutine behaves as follows: | ||
|
||
* If the current thread has access to FEAT_SME and PSTATE.ZA is 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe “ZA is active” (perhaps with the same kind of internal link as the comment below) now that the routines don't save anything when ZA is dormant.
aapcs64/aapcs64.rst
Outdated
|
||
* The subroutine behaves as follows: | ||
|
||
* If ``PTR`` does not point to a valid buffer with the required size, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should explicitly require the routine to “abort the program in some platform-defined manner” if PTR
is not 16-byte aligned. This would allow the low 4 bits to be used for future extensions. Same for the restore routine.
This would then be “Otherwise, if …”
aapcs64/aapcs64.rst
Outdated
* If ``PTR`` does not point to a valid buffer with the required size, the | ||
behavior of calling this routine is undefined. | ||
|
||
* The ZA state on normal return is the same as the ZA state on entry to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer true if ZA was dormant or off, since the save might have been committed between the calls. Perhaps we should just delete this bullet point and deal with the various cases below. E.g. the next bullet point deals with the case where ZA was dormant or off on entry to the save function.
aapcs64/aapcs64.rst
Outdated
|
||
* The current thread does not have access to SME. | ||
|
||
* ZA state is 'active' on entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* ZA state is 'active' on entry. | |
* ZA state is `active <ZA active state>`_ on entry. |
This PR adds a new "agnostic-ZA" interface which is intended to be called from any subroutine without requiring a change to PSTATE.ZA. This PR also adds new SME ABI routines to save/restore state enabled by PSTATE.ZA.
The corresponding ACLE PR can be found here.