Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SME] Add agnostic-ZA interface and routines to save/restore SME state #264

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
194 changes: 193 additions & 1 deletion aapcs64/aapcs64.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1749,10 +1749,11 @@ ZA interfaces
As noted in `ZA states`_, there are three possible ZA states: off,
dormant, and active. A subroutine's “ZA interface” specifies the possible
states of ZA on entry to a subroutine and the possible states of ZA on a
`normal return`_. The AAPCS64 defines two types of ZA interface:
`normal return`_. The AAPCS64 defines three types of ZA interface:

.. _`private-ZA`:
.. _`shared-ZA`:
.. _`agnostic-ZA`:

+-------------------+-------------------+---------------------------+
| Type of interface | ZA state on entry | ZA state on normal return |
Expand All @@ -1761,6 +1762,9 @@ states of ZA on entry to a subroutine and the possible states of ZA on a
+-------------------+-------------------+---------------------------+
| shared ZA | active | active |
+-------------------+-------------------+---------------------------+
| agnostic ZA | active, dormant | unchanged |
| | or off | |
+-------------------+-------------------+---------------------------+

Every subroutine has exactly one ZA interface. A subroutine's ZA interface
is independent of all other aspects of its interface. Callers must know
Expand All @@ -1776,6 +1780,10 @@ The shared-ZA interface is so called because it allows the subroutine
to share ZA contents with its caller. This can be useful if an SME
operation is split into several cooperating subroutines.

The agnostic-ZA interface is intended to be called from any function without
requiring a change to PSTATE.ZA and must preserve all state associated with
PSTATE.ZA.

Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_
interface can both (at their option) choose to guarantee that they
`preserve ZA`_.
Expand Down Expand Up @@ -2081,6 +2089,17 @@ support routines:
``__arm_get_current_vg``
Provides a safe way to detect the current value of VG.

``__arm_sme_state_size``
Provides a simple way to query the total size required to save the requested
state.

``__arm_sme_save``
Provides a safe way to save state enabled by PSTATE.ZA to a buffer.

``__arm_sme_restore``
Provides a safe way to restore state enabled by PSTATE.ZA from a buffer.


``__arm_sme_state``
^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -2305,6 +2324,179 @@ value of VG, with the subroutine having the following properties:

* Otherwise, the subroutine returns the value 0 in X0.


``__arm_sme_state_size``
^^^^^^^^^^^^^^^^^^^^^^^^

**(Beta)**

Platforms that support SME must provide a subroutine that returns a size
that is large enough to represent all state enabled by PSTATE.ZA.

* The subroutine is called ``__arm_sme_state_size``.

* The subroutine has an `agnostic-ZA`_ `streaming-compatible interface`_ with
the following properties:

* X1-X15, X19-X29 and SP are call-preserved.
* Z0-Z31 are call-preserved.
* P0-P15 are call-preserved.
* the subroutine `preserves ZA`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to delete this, since “agnostic-ZA” has its own requirements about the state of ZA on return.


* The subroutine takes no arguments.

* The subroutine returns an unsigned double word in X0 that represents
a size in bytes that is large enough to represent all state enabled by
PSTATE.ZA as well as any other state required for `__arm_sme_save`_ and
`__arm_sme_restore`_.

The exact layout used to calculate the size is unspecified. The
implementations of `__arm_sme_save`_ and `__arm_sme_restore`_ and
`__arm_sme_state_size`_ must all assume the same layout.

The size is guaranteed to be a multiple of 16.

* The subroutine behaves as follows:

* If the current thread has access to FEAT_SME and PSTATE.ZA is 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe “ZA is active” (perhaps with the same kind of internal link as the comment below) now that the routines don't save anything when ZA is dormant.

X0 contains the total size required to save and restore all SME state
enabled under PSTATE.ZA.

* Otherwise, X0 contains a size large enough to represent internal state
required for `__arm_sme_save`_ and `__arm_sme_restore`_.


``__arm_sme_save``
^^^^^^^^^^^^^^^^^^

**(Beta)**

Platforms that support SME must provide a subroutine to save any state enabled
by PSTATE.ZA.

* The subroutine is called ``__arm_sme_save``.

* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
the following properties:

* X1-X15, X19-X29 and SP are call-preserved.
* Z0-Z31 are call-preserved.
* P0-P15 are call-preserved.

* The custom ``ZA`` interface has the following properties:

* If ZA state is 'off' or 'dormant' on entry, then it is unchanged on normal
return.
* If ZA state is 'active' on entry, then it is 'dormant' on normal return.

* The subroutine takes the following arguments:

PTR
a 64-bit data pointer passed in X0 that points to a buffer which
is guaranteed to have a size that is equal to or larger than the size
returned by `__arm_sme_state_size`_.

* The subroutine does not return a value.

* The subroutine behaves as follows:

* If ``PTR`` does not point to a valid buffer with the required size, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should explicitly require the routine to “abort the program in some platform-defined manner” if PTR is not 16-byte aligned. This would allow the low 4 bits to be used for future extensions. Same for the restore routine.

This would then be “Otherwise, if …”

behaviour of calling this subroutine is undefined.

* For the address ``PTR->VALID`` at an unspecified offset in the buffer,
if the current thread does not have access to SME or if PSTATE.ZA is 0,
the value 0 is written to ``PTR->VALID`` and the subroutine returns.

* For addresses ``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at
unspecified offsets in the buffer pointed to by ``PTR``:

* The subroutine aborts in some platform-specific manner if the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this, since:

If the thread has access to SME then it must also have access to TPIDR2_EL0.

thread does not have access to ``TPIDR2_EL0``.

* The contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``.

* The address ``PTR->ZA`` is written to ``PTR->BLK.za_save_buffer``,
the streaming vector length in bytes (``SVL.B``) is written to
``PTR->BLK.num_za_save_slices`` and the address ``PTR->BLK`` is
written to ``TPIDR2_EL0``, thus setting up a lazy save.

* If ZT0 is available, then for the address ``PTR->ZT0`` at an
unspecified offset in the buffer pointed to by ``PTR``:

* The contents of ZT0 are written to ``PTR->ZT0``.

* The value 1 is written to ``PTR->VALID``.

``__arm_sme_restore``
^^^^^^^^^^^^^^^^^^^^^

**(Beta)**

Platforms that support SME must provide a subroutine to restore any state
enabled by PSTATE.ZA.

* The subroutine is called ``__arm_sme_restore``.

* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
the following properties:

* X1-X15, X19-X29 and SP are call-preserved.
* Z0-Z31 are call-preserved.
* P0-P15 are call-preserved.

* The custom ``ZA`` interface has the following properties:

* If ZA state is 'off' on entry and SME state needs restoring, then it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

active or dormant, since if ZA was dormant on entry to __arm_sme_save, and if the lazy save is later committed, we'll end up restoring the previous dormant state. Similarly for the final bullet point.

But I think we move this down immediately before “The subroutine behaves as follows:” and then just say something like:

  • The ZA state on normal return is the same as the ZA state on entry to the __arm_sme_save call that initialized PTR.

(Sentence feels a bit clunky though, so improvements definitely welcome.)

'active' on normal return.
* If ZA state is 'off' on entry and SME state does not need restoring, then
it is 'off' on normal return.
* If ZA state is 'dormant' on entry, then it is 'active' on normal return.

* The subroutine takes the following arguments:

PTR
a 64-bit data pointer passed in X0 that points to a buffer which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead saying:

  • a 64-bit data pointer in X0 that must previously have been passed to __arm_sme_save. The buffer that it points to must still contain the data written by that call.

?

is guaranteed to have a size that is equal to or larger than the size
returned by `__arm_sme_state_size`_.

* The subroutine does not return a value.

* The subroutine behaves as follows:

* If ``PTR`` does not point to a valid buffer with the required size, the
behaviour of calling this routine is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior (and similarly throughout).


* For the address ``PTR->VALID`` at an unspecified offset in the buffer,
if the value stored at address ``PTR->VALID`` is 0, then the subroutine does
nothing.

* Otherwise, the subroutine aborts in some platform-specific manner if
either of the following conditions is true:

* The current thread does not have access to SME.

* The current thread does not have access to ``TPIDR2_EL0`` when PSTATE.ZA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this can be deleted.

is enabled.

* ZA state on entry is 'active', meaning that PSTATE.ZA is enabled and
``TPIDR2_EL0`` is a NULL pointer.

* If PSTATE.ZA is disabled, the subroutine enables PSTATE.ZA.

* For addresses ``PTR->BLK`` and ``PTR->TPIDR2_EL0``
at unspecified offsets in the buffer pointed to by ``PTR``:

* If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to
``PTR->BLK`` and calls ``__arm_tpidr2_restore``.

* The contents of ``PTR->TPIDR2_EL0`` are copied to ``TPIDR2_EL0``.

* If ZT0 is available, then for the address ``PTR->ZT0`` at an
unspecified offset in the buffer pointed to by ``PTR``:

* The contents of ``PTR->ZT0`` are copied to ZT0.


Pseudo-code examples
====================

Expand Down
Loading