-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SME] Add agnostic-ZA interface and routines to save/restore SME state #264
base: main
Are you sure you want to change the base?
Changes from 7 commits
aeccbc9
6f1bba8
34eeecc
35aae18
447378a
82c84bd
20cf743
7568774
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1749,10 +1749,11 @@ ZA interfaces | |||||||||||
As noted in `ZA states`_, there are three possible ZA states: off, | ||||||||||||
dormant, and active. A subroutine's “ZA interface” specifies the possible | ||||||||||||
states of ZA on entry to a subroutine and the possible states of ZA on a | ||||||||||||
`normal return`_. The AAPCS64 defines two types of ZA interface: | ||||||||||||
`normal return`_. The AAPCS64 defines three types of ZA interface: | ||||||||||||
|
||||||||||||
.. _`private-ZA`: | ||||||||||||
.. _`shared-ZA`: | ||||||||||||
.. _`agnostic-ZA`: | ||||||||||||
|
||||||||||||
+-------------------+-------------------+---------------------------+ | ||||||||||||
| Type of interface | ZA state on entry | ZA state on normal return | | ||||||||||||
|
@@ -1761,6 +1762,10 @@ states of ZA on entry to a subroutine and the possible states of ZA on a | |||||||||||
+-------------------+-------------------+---------------------------+ | ||||||||||||
| shared ZA | active | active | | ||||||||||||
+-------------------+-------------------+---------------------------+ | ||||||||||||
| | active or off | unchanged | | ||||||||||||
| agnostic ZA +-------------------+---------------------------+ | ||||||||||||
| | dormant | unchanged or off | | ||||||||||||
+-------------------+-------------------+---------------------------+ | ||||||||||||
|
||||||||||||
Every subroutine has exactly one ZA interface. A subroutine's ZA interface | ||||||||||||
is independent of all other aspects of its interface. Callers must know | ||||||||||||
|
@@ -1776,6 +1781,13 @@ The shared-ZA interface is so called because it allows the subroutine | |||||||||||
to share ZA contents with its caller. This can be useful if an SME | ||||||||||||
operation is split into several cooperating subroutines. | ||||||||||||
|
||||||||||||
The `agnostic-ZA`_ interface is intended to be called from any subroutine | ||||||||||||
without requiring a change to PSTATE.ZA. Subroutines with an `agnostic-ZA`_ | ||||||||||||
interface behave like subroutines with a `private-ZA`_ interface when ZA is | ||||||||||||
off or dormant on entry, but must additionally allow ZA to be active on | ||||||||||||
entry; in this case, the subroutine must preserve all state associated with | ||||||||||||
PSTATE.ZA when returning normally. | ||||||||||||
|
||||||||||||
Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_ | ||||||||||||
interface can both (at their option) choose to guarantee that they | ||||||||||||
`preserve ZA`_. | ||||||||||||
|
@@ -2081,6 +2093,17 @@ support routines: | |||||||||||
``__arm_get_current_vg`` | ||||||||||||
Provides a safe way to detect the current value of VG. | ||||||||||||
|
||||||||||||
``__arm_sme_state_size`` | ||||||||||||
Provides a simple way to query the total size required to save the requested | ||||||||||||
state. | ||||||||||||
|
||||||||||||
``__arm_sme_save`` | ||||||||||||
Provides a safe way to save state enabled by PSTATE.ZA to a buffer. | ||||||||||||
|
||||||||||||
``__arm_sme_restore`` | ||||||||||||
Provides a safe way to restore state enabled by PSTATE.ZA from a buffer. | ||||||||||||
|
||||||||||||
|
||||||||||||
``__arm_sme_state`` | ||||||||||||
^^^^^^^^^^^^^^^^^^^ | ||||||||||||
|
||||||||||||
|
@@ -2305,6 +2328,188 @@ value of VG, with the subroutine having the following properties: | |||||||||||
|
||||||||||||
* Otherwise, the subroutine returns the value 0 in X0. | ||||||||||||
|
||||||||||||
|
||||||||||||
``__arm_sme_state_size`` | ||||||||||||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||
|
||||||||||||
**(Beta)** | ||||||||||||
|
||||||||||||
Platforms that support SME must provide a subroutine that returns a size | ||||||||||||
that is large enough to represent all state enabled by PSTATE.ZA. | ||||||||||||
|
||||||||||||
* The subroutine is called ``__arm_sme_state_size``. | ||||||||||||
|
||||||||||||
* The subroutine has an `agnostic-ZA`_ `streaming-compatible interface`_ with | ||||||||||||
the following properties: | ||||||||||||
|
||||||||||||
* X1-X15, X19-X29 and SP are call-preserved. | ||||||||||||
* Z0-Z31 are call-preserved. | ||||||||||||
* P0-P15 are call-preserved. | ||||||||||||
|
||||||||||||
* The subroutine takes no arguments. | ||||||||||||
|
||||||||||||
* The subroutine returns an unsigned double word in X0 that represents | ||||||||||||
a size in bytes that is large enough to represent all state enabled by | ||||||||||||
PSTATE.ZA as well as any other state required for `__arm_sme_save`_ and | ||||||||||||
`__arm_sme_restore`_. | ||||||||||||
|
||||||||||||
The exact layout used to calculate the size is unspecified. The | ||||||||||||
implementations of `__arm_sme_save`_ and `__arm_sme_restore`_ and | ||||||||||||
`__arm_sme_state_size`_ must all assume the same layout. | ||||||||||||
|
||||||||||||
The size is guaranteed to be a multiple of 16. | ||||||||||||
|
||||||||||||
* The subroutine behaves as follows: | ||||||||||||
|
||||||||||||
* If the current thread has access to FEAT_SME and PSTATE.ZA is 1, | ||||||||||||
X0 contains the total size required to save and restore all SME state | ||||||||||||
enabled by PSTATE.ZA. | ||||||||||||
|
||||||||||||
* Otherwise, X0 contains a size large enough to represent internal state | ||||||||||||
required for `__arm_sme_save`_ and `__arm_sme_restore`_. | ||||||||||||
|
||||||||||||
|
||||||||||||
``__arm_sme_save`` | ||||||||||||
^^^^^^^^^^^^^^^^^^ | ||||||||||||
|
||||||||||||
**(Beta)** | ||||||||||||
|
||||||||||||
Platforms that support SME must provide a subroutine to save any state enabled | ||||||||||||
by PSTATE.ZA. | ||||||||||||
|
||||||||||||
* The subroutine is called ``__arm_sme_save``. | ||||||||||||
|
||||||||||||
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with | ||||||||||||
the following properties: | ||||||||||||
|
||||||||||||
* X1-X15, X19-X29 and SP are call-preserved. | ||||||||||||
* Z0-Z31 are call-preserved. | ||||||||||||
* P0-P15 are call-preserved. | ||||||||||||
|
||||||||||||
* The subroutine takes the following arguments: | ||||||||||||
|
||||||||||||
PTR | ||||||||||||
a 64-bit data pointer passed in X0 that points to a buffer which | ||||||||||||
is guaranteed to have a size that is equal to or larger than the size | ||||||||||||
returned by `__arm_sme_state_size`_. The pointer must be 16-byte aligned. | ||||||||||||
|
||||||||||||
* The subroutine does not return a value. | ||||||||||||
|
||||||||||||
* The subroutine behaves as follows: | ||||||||||||
|
||||||||||||
* If ``PTR`` does not point to a valid buffer with the required size, the | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should explicitly require the routine to “abort the program in some platform-defined manner” if This would then be “Otherwise, if …” |
||||||||||||
behavior of calling this subroutine is undefined. | ||||||||||||
|
||||||||||||
* If ZA state is 'active' on entry, then it is 'dormant' on normal return. | ||||||||||||
Otherwise the ZA state is unchanged. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(untested) with a new internal target for “ZA dormant state”. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some reason, these links don't work for me when I generate the PDF (active or dormant), even though I've spelled them correctly. |
||||||||||||
|
||||||||||||
* For the address ``PTR->VALID`` at an unspecified offset in the buffer, | ||||||||||||
the value 0 is written to ``PTR->VALID`` and the subroutine returns, if | ||||||||||||
either of the following conditions is true: | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
|
||||||||||||
* The current thread does not have access to SME. | ||||||||||||
|
||||||||||||
* PSTATE.ZA is 0. | ||||||||||||
|
||||||||||||
* TPIDR2_EL0 is not a NULL pointer. | ||||||||||||
|
||||||||||||
* For addresses ``PTR->BLK`` and ``PTR->ZA`` at unspecified offsets in | ||||||||||||
the buffer pointed to by ``PTR``, the routine must set up a lazy save | ||||||||||||
by zero-initializing the TPIDR2 block at address ``PTR->BLK``, then | ||||||||||||
writing address ``PTR->ZA`` to ``PTR->BLK.za_save_buffer``, writing the | ||||||||||||
streaming vector length in bytes (``SVL.B``) to | ||||||||||||
``PTR->BLK.num_za_save_slices`` and finally copying the address ``PTR->BLK`` | ||||||||||||
to ``TPIDR2_EL0``. | ||||||||||||
|
||||||||||||
* If ZT0 is available, then for the address ``PTR->ZT0`` at an unspecified | ||||||||||||
offset in the buffer pointed to by ``PTR``, the contents of ZT0 are written | ||||||||||||
to ``PTR->ZT0``. | ||||||||||||
|
||||||||||||
* The value 1 is written to ``PTR->VALID``. | ||||||||||||
|
||||||||||||
``__arm_sme_restore`` | ||||||||||||
^^^^^^^^^^^^^^^^^^^^^ | ||||||||||||
|
||||||||||||
**(Beta)** | ||||||||||||
|
||||||||||||
Platforms that support SME must provide a subroutine to restore any state | ||||||||||||
enabled by PSTATE.ZA. | ||||||||||||
|
||||||||||||
* The subroutine is called ``__arm_sme_restore``. | ||||||||||||
|
||||||||||||
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with | ||||||||||||
the following properties: | ||||||||||||
|
||||||||||||
* X1-X15, X19-X29 and SP are call-preserved. | ||||||||||||
* Z0-Z31 are call-preserved. | ||||||||||||
* P0-P15 are call-preserved. | ||||||||||||
|
||||||||||||
* The subroutine takes the following arguments: | ||||||||||||
|
||||||||||||
PTR | ||||||||||||
a 64-bit data pointer passed in X0 that points to a buffer that | ||||||||||||
is initialized by a call to `__arm_sme_save`_. The pointer must be 16-byte | ||||||||||||
aligned. | ||||||||||||
|
||||||||||||
* The subroutine does not return a value. | ||||||||||||
|
||||||||||||
* The subroutine behaves as follows: | ||||||||||||
|
||||||||||||
* If ``PTR`` does not point to a valid buffer with the required size, the | ||||||||||||
behavior of calling this routine is undefined. | ||||||||||||
|
||||||||||||
* The ZA state on normal return is the same as the ZA state on entry to the | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is no longer true if ZA was dormant or off, since the save might have been committed between the calls. Perhaps we should just delete this bullet point and deal with the various cases below. E.g. the next bullet point deals with the case where ZA was dormant or off on entry to the save function. |
||||||||||||
call to `__arm_sme_save`_ that was used to initialize the buffer | ||||||||||||
pointed to by ``PTR``. | ||||||||||||
|
||||||||||||
* For the address ``PTR->VALID`` at an unspecified offset in the buffer, | ||||||||||||
if the value stored at address ``PTR->VALID`` is 0, then the subroutine | ||||||||||||
does nothing. | ||||||||||||
|
||||||||||||
* Otherwise, the subroutine aborts in some platform-specific manner if | ||||||||||||
either of the following conditions is true: | ||||||||||||
|
||||||||||||
* The current thread does not have access to SME. | ||||||||||||
|
||||||||||||
* ZA state is 'active' on entry. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
|
||||||||||||
* If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. | ||||||||||||
|
||||||||||||
* For the address ``PTR->BLK`` at an unspecified offset in the buffer | ||||||||||||
pointed to by ``PTR``: | ||||||||||||
|
||||||||||||
* If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to | ||||||||||||
``PTR->BLK`` and calls ``__arm_tpidr2_restore``. | ||||||||||||
|
||||||||||||
* The value 0 is written to ``TPIDR2_EL0``. | ||||||||||||
|
||||||||||||
* If ZT0 is available, then for the address ``PTR->ZT0`` at an | ||||||||||||
unspecified offset in the buffer pointed to by ``PTR``, the contents of | ||||||||||||
``PTR->ZT0`` are copied to ZT0. | ||||||||||||
|
||||||||||||
|
||||||||||||
Dynamic symbols for supported state | ||||||||||||
----------------------------------- | ||||||||||||
|
||||||||||||
A platform that supports SME may provide a set of dynamic symbols. | ||||||||||||
|
||||||||||||
The availability of these dynamic symbols indicates whether SME state is | ||||||||||||
supported by the routines provided by the platform. These symbols | ||||||||||||
can be used during dynamic linking to verify that SME state used in the | ||||||||||||
program will be handled correctly by the runtime. | ||||||||||||
|
||||||||||||
This is particularly relevant for calls to `agnostic-ZA`_ functions, which | ||||||||||||
can't make assumptions on PSTATE.ZA or what state is enabled by it. These | ||||||||||||
functions rely on the routines defined in `SME support routines`_ to preserve | ||||||||||||
all SME state that may be live in the caller. The level of support required | ||||||||||||
by the program must therefore match the level of support provided by the | ||||||||||||
runtime, which for dynamically linked executables can only be asserted | ||||||||||||
during dynamic linking. | ||||||||||||
|
||||||||||||
* ``__arm_sme_routines_support_zt0`` is available when the SME support routines | ||||||||||||
support ZT0. | ||||||||||||
|
||||||||||||
|
||||||||||||
Pseudo-code examples | ||||||||||||
==================== | ||||||||||||
|
||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe “ZA is active” (perhaps with the same kind of internal link as the comment below) now that the routines don't save anything when ZA is dormant.