-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SME] Add agnostic-ZA interface and routines to save/restore SME state #264
base: main
Are you sure you want to change the base?
Changes from 3 commits
aeccbc9
6f1bba8
34eeecc
35aae18
447378a
82c84bd
20cf743
7568774
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -1749,10 +1749,11 @@ ZA interfaces | |||||
As noted in `ZA states`_, there are three possible ZA states: off, | ||||||
dormant, and active. A subroutine's “ZA interface” specifies the possible | ||||||
states of ZA on entry to a subroutine and the possible states of ZA on a | ||||||
`normal return`_. The AAPCS64 defines two types of ZA interface: | ||||||
`normal return`_. The AAPCS64 defines three types of ZA interface: | ||||||
|
||||||
.. _`private-ZA`: | ||||||
.. _`shared-ZA`: | ||||||
.. _`agnostic-ZA`: | ||||||
|
||||||
+-------------------+-------------------+---------------------------+ | ||||||
| Type of interface | ZA state on entry | ZA state on normal return | | ||||||
|
@@ -1761,6 +1762,9 @@ states of ZA on entry to a subroutine and the possible states of ZA on a | |||||
+-------------------+-------------------+---------------------------+ | ||||||
| shared ZA | active | active | | ||||||
+-------------------+-------------------+---------------------------+ | ||||||
| agnostic ZA | active, dormant | unchanged | | ||||||
| | or off | | | ||||||
+-------------------+-------------------+---------------------------+ | ||||||
|
||||||
Every subroutine has exactly one ZA interface. A subroutine's ZA interface | ||||||
is independent of all other aspects of its interface. Callers must know | ||||||
|
@@ -1776,6 +1780,10 @@ The shared-ZA interface is so called because it allows the subroutine | |||||
to share ZA contents with its caller. This can be useful if an SME | ||||||
operation is split into several cooperating subroutines. | ||||||
|
||||||
The agnostic-ZA interface is intended to be called from any function without | ||||||
requiring a change to PSTATE.ZA and must preserve all state associated with | ||||||
PSTATE.ZA. | ||||||
|
||||||
Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_ | ||||||
interface can both (at their option) choose to guarantee that they | ||||||
`preserve ZA`_. | ||||||
|
@@ -2081,6 +2089,17 @@ support routines: | |||||
``__arm_get_current_vg`` | ||||||
Provides a safe way to detect the current value of VG. | ||||||
|
||||||
``__arm_sme_state_size`` | ||||||
Provides a simple way to query the total size required to save the requested | ||||||
state. | ||||||
|
||||||
``__arm_sme_save`` | ||||||
Provides a safe way to save state enabled by PSTATE.ZA to a buffer. | ||||||
|
||||||
``__arm_sme_restore`` | ||||||
Provides a safe way to restore state enabled by PSTATE.ZA from a buffer. | ||||||
|
||||||
|
||||||
``__arm_sme_state`` | ||||||
^^^^^^^^^^^^^^^^^^^ | ||||||
|
||||||
|
@@ -2305,6 +2324,221 @@ value of VG, with the subroutine having the following properties: | |||||
|
||||||
* Otherwise, the subroutine returns the value 0 in X0. | ||||||
|
||||||
|
||||||
``__arm_sme_state_size`` | ||||||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||
|
||||||
**(Beta)** | ||||||
|
||||||
Platforms that support SME must provide a subroutine that returns a size | ||||||
that is large enough to represent all state enabled by PSTATE.ZA. | ||||||
|
||||||
* The subroutine is called ``__arm_sme_state_size``. | ||||||
|
||||||
* The subroutine has an `agnostic-ZA`_ `streaming-compatible interface`_ with | ||||||
the following properties: | ||||||
|
||||||
* X1-X15, X19-X29 and SP are call-preserved. | ||||||
* Z0-Z31 are call-preserved. | ||||||
* P0-P15 are call-preserved. | ||||||
* the subroutine `preserves ZA`_. | ||||||
|
||||||
* The subroutine takes the following argument: | ||||||
|
||||||
OPTIONS | ||||||
a 64-bit value passed in X0 describing the following options: | ||||||
|
||||||
+--------+-----------------------------------------+ | ||||||
| bits | Options | | ||||||
+========+=========================================+ | ||||||
| 63 - 2 | Zero for this revision of the AAPCS64, | | ||||||
| | but reserved for future expansion | | ||||||
+--------+-----------------------------------------+ | ||||||
| 1 | Exclude ZT0 | | ||||||
+--------+-----------------------------------------+ | ||||||
| 0 | Exclude ZA | | ||||||
+--------+-----------------------------------------+ | ||||||
|
||||||
A value of 0 means that all SME state will be considered in the size | ||||||
calculation. | ||||||
|
||||||
* The subroutine returns an unsigned double word in X0 that represents | ||||||
a size in bytes that is large enough to represent all state enabled by | ||||||
PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``, | ||||||
as well as any other state required for `__arm_sme_save`_ and | ||||||
`__arm_sme_restore`_. | ||||||
|
||||||
`__arm_sme_state_size`_ assumes that ZA is saved lazily and will account | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is potentially confusing, since it isn't clear who is assumed to do the lazy saving. I think what this paragraph is saying comes under the previous sentence — “as well as any other state required for |
||||||
for the save of ``TPIDR2_EL0``. | ||||||
|
||||||
The exact layout used to calculate the size is unspecified. The | ||||||
implementations of `__arm_sme_save`_ and `__arm_sme_restore`_ and | ||||||
`__arm_sme_state_size`_ must all assume the same layout. | ||||||
|
||||||
The size is guaranteed to be a multiple of 16. | ||||||
|
||||||
* The subroutine behaves as follows: | ||||||
|
||||||
* If the current thread has access to FEAT_SME and PSTATE.ZA is 1, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe “ZA is active” (perhaps with the same kind of internal link as the comment below) now that the routines don't save anything when ZA is dormant. |
||||||
X0 contains the total size required to represent all SME state enabled | ||||||
under PSTATE.ZA predicated on the requirements specified in ``OPTIONS``. | ||||||
|
||||||
* Otherwise, X0 is 0. | ||||||
|
||||||
|
||||||
``__arm_sme_save`` | ||||||
^^^^^^^^^^^^^^^^^^ | ||||||
|
||||||
**(Beta)** | ||||||
|
||||||
Platforms that support SME must provide a subroutine to save any state enabled | ||||||
by PSTATE.ZA. | ||||||
|
||||||
* The subroutine is called ``__arm_sme_save``. | ||||||
|
||||||
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with | ||||||
the following properties: | ||||||
|
||||||
* X2-X15, X19-X29 and SP are call-preserved. | ||||||
* Z0-Z31 are call-preserved. | ||||||
* P0-P15 are call-preserved. | ||||||
|
||||||
* The subroutine takes the following arguments: | ||||||
|
||||||
OPTIONS | ||||||
a 64-bit value passed in X0 describing the following options: | ||||||
|
||||||
+--------+-----------------------------------------+ | ||||||
| bits | Options | | ||||||
+========+=========================================+ | ||||||
| 63 - 2 | Zero for this revision of the AAPCS64, | | ||||||
| | but reserved for future expansion | | ||||||
+--------+-----------------------------------------+ | ||||||
| 1 | Exclude ZT0 | | ||||||
+--------+-----------------------------------------+ | ||||||
| 0 | Exclude ZA | | ||||||
+--------+-----------------------------------------+ | ||||||
|
||||||
A value of 0 means all SME state will be saved. | ||||||
|
||||||
PTR | ||||||
a 64-bit data pointer passed in X1 that points to a buffer which is | ||||||
guaranteed to be large enough to represent all SME state for the | ||||||
requirements specified by ``OPTIONS``. | ||||||
|
||||||
* The subroutine does not return a value. | ||||||
|
||||||
* The subroutine behaves as follows: | ||||||
|
||||||
* The subroutine aborts in some platform-specific manner if either of the | ||||||
following conditions is true: | ||||||
|
||||||
* The current thread does not have access to SME. | ||||||
|
||||||
* The current thread does not have access to ``TPIDR2_EL0`` when | ||||||
PSTATE.ZA is 1. | ||||||
|
||||||
* If ``PTR`` does not point to a valid buffer with the required size, the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should explicitly require the routine to “abort the program in some platform-defined manner” if This would then be “Otherwise, if …” |
||||||
behaviour of calling this routine is undefined. | ||||||
|
||||||
* If PSTATE.ZA is 0, the subroutine does nothing. | ||||||
|
||||||
* If bit 0 of ``OPTIONS`` is 0, then for addresses ``PTR->SAVED_ZA``, | ||||||
``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at unspecified offsets | ||||||
in the buffer pointed to by ``PTR``: | ||||||
|
||||||
* The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Using “full” sounded odd in reference to a system register that contains a pointer. |
||||||
|
||||||
* The address ``PTR->ZA`` is written to ``PTR->BLK.za_save_buffer``, | ||||||
the streaming vector length in bytes (``SVL.B``) is written to | ||||||
``PTR->BLK.num_za_save_slices`` and the address ``PTR->BLK`` is | ||||||
written to ``TPIDR2_EL0``, thus setting up a lazy save. | ||||||
|
||||||
* The value 1 is written to ``PTR->SAVED_ZA``. | ||||||
|
||||||
* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then for the addresses | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume that if bit 0 is 1, the function should turn PSTATE.ZA off before returning. In other words, ZA must not be active on return from this function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good spot, you're right! |
||||||
``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified offsets in the | ||||||
buffer pointed to by ``PTR``: | ||||||
|
||||||
* The full contents of ZT0 are written to ``PTR->ZT0``. | ||||||
|
||||||
* The value 1 is written to ``PTR->SAVED_ZT0``. | ||||||
|
||||||
``__arm_sme_restore`` | ||||||
^^^^^^^^^^^^^^^^^^^^^ | ||||||
|
||||||
**(Beta)** | ||||||
|
||||||
Platforms that support SME must provide a subroutine to restore any state | ||||||
enabled by PSTATE.ZA. | ||||||
|
||||||
* The subroutine is called ``__arm_sme_restore``. | ||||||
|
||||||
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with | ||||||
the following properties: | ||||||
|
||||||
* X2-X15, X19-X29 and SP are call-preserved. | ||||||
* Z0-Z31 are call-preserved. | ||||||
* P0-P15 are call-preserved. | ||||||
|
||||||
* The subroutine takes the following arguments: | ||||||
|
||||||
OPTIONS | ||||||
a 64-bit value passed in X0 describing the following options: | ||||||
|
||||||
+--------+-----------------------------------------+ | ||||||
| bits | Options | | ||||||
+========+=========================================+ | ||||||
| 63 - 2 | Zero for this revision of the AAPCS64, | | ||||||
| | but reserved for future expansion | | ||||||
+--------+-----------------------------------------+ | ||||||
| 1 | Exclude ZT0 | | ||||||
+--------+-----------------------------------------+ | ||||||
| 0 | Exclude ZA | | ||||||
+--------+-----------------------------------------+ | ||||||
|
||||||
A value of 0 means all SME state will be restored. | ||||||
|
||||||
PTR | ||||||
a 64-bit data pointer passed in X1 that points to a buffer which is | ||||||
guaranteed to be large enough to represent all SME state for the | ||||||
requirements specified by ``OPTIONS``. | ||||||
|
||||||
* The subroutine does not return a value. | ||||||
|
||||||
* The subroutine behaves as follows: | ||||||
|
||||||
* The subroutine aborts in some platform-specific manner if either of the | ||||||
following conditions is true: | ||||||
|
||||||
* The current thread does not have access to SME. | ||||||
|
||||||
* The current thread does not have access to ``TPIDR2_EL0`` when | ||||||
PSTATE.ZA is 1. | ||||||
|
||||||
* If ``PTR`` does not point to a valid buffer with the required size, the | ||||||
behaviour of calling this routine is undefined. | ||||||
|
||||||
* For addresses ``PTR->SAVED_ZA``, ``PTR->BLK`` and ``PTR->TPIDR2_EL0`` | ||||||
at unspecified offsets in the buffer pointed to by ``PTR``, if | ||||||
``PTR->SAVED_ZA`` is 1 and bit 0 of ``OPTIONS`` is 0, then: | ||||||
|
||||||
* If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. | ||||||
|
||||||
* If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to | ||||||
``PTR->BLK`` and calls ``__arm_tpidr2_restore``. | ||||||
|
||||||
* The contents of ``PTR->TPIDR2_EL0`` are copied to ``TPIDR2_EL0``. | ||||||
|
||||||
* For addresses ``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified | ||||||
offsets in the buffer pointed to by ``PTR``, if ``PTR->SAVED_ZT0`` is 1 | ||||||
and bit 1 of ``OPTIONS`` is 0, then: | ||||||
|
||||||
* If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA. | ||||||
|
||||||
* The full contents of ``PTR->ZT0`` are copied to ZT0. | ||||||
|
||||||
|
||||||
Pseudo-code examples | ||||||
==================== | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to delete this, since “agnostic-ZA” has its own requirements about the state of ZA on return.