Skip to content

Commit

Permalink
Address review comments.
Browse files Browse the repository at this point in the history
* Renamed `compatible ZA` -> `agnostic ZA`

* Changed __arm_sme_save/restore such that the save routine should
  record whether ZA or ZT0 is saved and such that the restore routine
  should check whether ZA or ZT0 was saved. This removes the
  (previously) implicit assumption in `__arm_sme-save` that PSTATE.ZA
  must be 1 if PTR is not nullptr.

* ZA is now always saved/restored using the lazy-save mechanism.

* Changed __arm_sme_save/restore to have a custom ZA interface instead
  of 'agnostic-ZA' which was incorrect.
  • Loading branch information
sdesmalen-arm committed Aug 22, 2024
1 parent aeccbc9 commit 6f1bba8
Showing 1 changed file with 67 additions and 78 deletions.
145 changes: 67 additions & 78 deletions aapcs64/aapcs64.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1753,7 +1753,8 @@ states of ZA on entry to a subroutine and the possible states of ZA on a

.. _`private-ZA`:
.. _`shared-ZA`:
.. _`compatible-ZA`:
.. _`agnostic-ZA`:
.. _`restore-ZA`:

+-------------------+-------------------+---------------------------+
| Type of interface | ZA state on entry | ZA state on normal return |
Expand All @@ -1762,7 +1763,7 @@ states of ZA on entry to a subroutine and the possible states of ZA on a
+-------------------+-------------------+---------------------------+
| shared ZA | active | active |
+-------------------+-------------------+---------------------------+
| compatible ZA | active, dormant | unchanged |
| agnostic ZA | active, dormant | unchanged |
| | or off | |
+-------------------+-------------------+---------------------------+

Expand All @@ -1780,12 +1781,12 @@ The shared-ZA interface is so called because it allows the subroutine
to share ZA contents with its caller. This can be useful if an SME
operation is split into several cooperating subroutines.

The compatible-ZA interface is intended to be called from any function
without requiring a change to PSTATE.ZA and is generally used in conjunction
with the expectation that it `preserves ZA`_.
The agnostic-ZA interface is intended to be called from any function without
requiring a change to PSTATE.ZA and must preserve all state associated with
PSTATE.ZA.

Subroutines with a `private-ZA`_ interface, `shared-ZA`_ interface or
`compatible-ZA`_ interface can (at their option) choose to guarantee that they
Subroutines with a `private-ZA`_ interface and subroutines with a `shared-ZA`_
interface can both (at their option) choose to guarantee that they
`preserve ZA`_.

Parameter passing
Expand Down Expand Up @@ -2094,10 +2095,10 @@ support routines:
state.

``__arm_sme_save``
Provides a safe way to save all state enabled by PSTATE.ZA.
Provides a safe way to save state enabled by PSTATE.ZA to a buffer.

``__arm_sme_restore``
Provides a safe way to restore all state enabled by PSTATE.ZA from a buffer.
Provides a safe way to restore state enabled by PSTATE.ZA from a buffer.


``__arm_sme_state``
Expand Down Expand Up @@ -2335,7 +2336,7 @@ that is large enough to represent all state enabled by PSTATE.ZA.

* The subroutine is called ``__arm_sme_state_size``.

* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with
* The subroutine has an `agnostic-ZA`_ `streaming-compatible interface`_ with
the following properties:

* X1-X15, X19-X29 and SP are call-preserved.
Expand All @@ -2351,42 +2352,37 @@ that is large enough to represent all state enabled by PSTATE.ZA.
+--------+-----------------------------------------+
| bits | Options |
+========+=========================================+
| 63 | Preserve ZA using lazy save mechanism |
+--------+-----------------------------------------+
| 62 - 3 | Zero for this revision of the AAPCS64, |
| 63 - 2 | Zero for this revision of the AAPCS64, |
| | but reserved for future expansion |
+--------+-----------------------------------------+
| 2 | Exclude TPIDR2_EL0 |
+--------+-----------------------------------------+
| 1 | Exclude ZT0 |
+--------+-----------------------------------------+
| 0 | Exclude ZA |
+--------+-----------------------------------------+

If bit 63 is 1, then the size calculation will include the allocation
of a TPIDR2 block.

A value of 0 means that all SME state will be considered in the size
calculation.

* The subroutine returns an unsigned double word in X0 that represents
a size in bytes that is large enough to represent all state enabled by
PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``.
The size is guaranteed to be a multiple of 16.
PSTATE.ZA, predicated on the requirements specified in ``OPTIONS``,
as well as any other state required for `__arm_sme_save`_ and
`__arm_sme_restore`_.

The layout that corresponds to the calculated size is unspecified,
but the assumption is that the size always matches the implementation
of the `__arm_sme_save`_ and `__arm_sme_restore`_ routines.
`__arm_sme_state_size`_ assumes that ZA is saved lazily and will account
for the save of ``TPIDR2_EL0``.

* The subroutine behaves as follows:
The exact layout used to calculate the size is unspecified. The
implementations of `__arm_sme_save`_ and `__arm_sme_restore`_ and
`__arm_sme_state_size`_ must all assume the same layout.

The size is guaranteed to be a multiple of 16.

* If both bit 0 and bit 63 of ``OPTIONS`` are 1, then the subroutine aborts
in some platform-specific manner.
* The subroutine behaves as follows:

* If the current thread has access to FEAT_SME and PSTATE.ZA is 1,
X0 contains the total size required to represent all SME state enabled
under PSTATE.ZA, with the exception of the state requested to be excluded
as described in ``OPTIONS``.
under PSTATE.ZA predicated on the requirements specified in ``OPTIONS``.

* Otherwise, X0 is 0.

Expand All @@ -2401,7 +2397,7 @@ by PSTATE.ZA.

* The subroutine is called ``__arm_sme_save``.

* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
the following properties:

* X2-X15, X19-X29 and SP are call-preserved.
Expand All @@ -2416,13 +2412,9 @@ by PSTATE.ZA.
+--------+-----------------------------------------+
| bits | Options |
+========+=========================================+
| 63 | Preserve ZA using lazy save mechanism |
+--------+-----------------------------------------+
| 62 - 3 | Zero for this revision of the AAPCS64, |
| 63 - 2 | Zero for this revision of the AAPCS64, |
| | but reserved for future expansion |
+--------+-----------------------------------------+
| 2 | Exclude TPIDR2_EL0 |
+--------+-----------------------------------------+
| 1 | Exclude ZT0 |
+--------+-----------------------------------------+
| 0 | Exclude ZA |
Expand All @@ -2439,39 +2431,39 @@ by PSTATE.ZA.

* The subroutine behaves as follows:

* If PTR is null, then the subroutine does nothing.

* The subroutine aborts in some platform-specific manner if either of the
following conditions is true:

* the current thread does not have access to SME or PSTATE.ZA is 0.
* The current thread does not have access to SME.

* The current thread does not have access to ``TPIDR2_EL0`` when
PSTATE.ZA is 1.

* the current thread does not have access to TPIDR2_EL0 and bit 2 of
``OPTIONS`` is 0.
* If ``PTR`` does not point to a valid buffer with the required size, the
behaviour of calling this routine is undefined.

* both bit 0 and bit 63 of ``OPTIONS`` are 1.
* If PSTATE.ZA is 0, the subroutine does nothing.

* For addresses ``PTR->BLK``, ``PTR->ZA_BUFFER``, ``PTR->ZA``, ``PTR->ZT0``
and ``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by
PTR:
* If bit 0 of ``OPTIONS`` is 0, then for addresses ``PTR->SAVED_ZA``,
``PTR->BLK``, ``PTR->ZA`` and ``PTR->TPIDR2_EL0`` at unspecified offsets
in the buffer pointed to by ``PTR``:

* If bit 2 of ``OPTIONS`` is 0, the subroutine stores the contents of
TPIDR2_EL0 to ``PTR->TPIDR2_EL0``.
* The full contents of ``TPIDR2_EL0`` are written to ``PTR->TPIDR2_EL0``.

* If bit 63 of ``OPTIONS`` is 1, then the subroutine sets up a lazy-save by
storing the address ``PTR->ZA_BUFFER`` to ``PTR->BLK.za_save_buffer``,
storing the streaming vector length in bytes (``SVL.B``) to
``PTR->BLK.num_za_save_slices`` and copying the address ``PTR->BLK`` to
``TPIDR2_EL0``.
* The address ``PTR->ZA`` is written to ``PTR->BLK.za_save_buffer``,
the streaming vector length in bytes (``SVL.B``) is written to
``PTR->BLK.num_za_save_slices`` and the address ``PTR->BLK`` is
written to ``TPIDR2_EL0``, thus setting up a lazy save.

* If bit 0 of ``OPTIONS`` is 0, then the subroutine stores the entire
contents of ZA to ``PTR->ZA_BUFFER``.
* The value 1 is written to ``PTR->SAVED_ZA``.

* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine
stores the full contents of ZT0 to ``PTR->ZT0``.
* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then for the addresses
``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified offsets in the
buffer pointed to by ``PTR``:

* If bit 63 of ``OPTIONS`` is 0, then the subroutine disables PSTATE.ZA.
* The full contents of ZT0 are written to ``PTR->ZT0``.

* The value 1 is written to ``PTR->SAVED_ZT0``.

``__arm_sme_restore``
^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -2483,7 +2475,7 @@ enabled by PSTATE.ZA.

* The subroutine is called ``__arm_sme_restore``.

* The subroutine has a `compatible-ZA`_ `streaming-compatible interface`_ with
* The subroutine has a custom ``ZA`` `streaming-compatible interface`_ with
the following properties:

* X2-X15, X19-X29 and SP are call-preserved.
Expand All @@ -2498,13 +2490,9 @@ enabled by PSTATE.ZA.
+--------+-----------------------------------------+
| bits | Options |
+========+=========================================+
| 63 | Preserve ZA using lazy save mechanism |
+--------+-----------------------------------------+
| 62 - 3 | Zero for this revision of the AAPCS64, |
| 63 - 2 | Zero for this revision of the AAPCS64, |
| | but reserved for future expansion |
+--------+-----------------------------------------+
| 2 | Exclude TPIDR2_EL0 |
+--------+-----------------------------------------+
| 1 | Exclude ZT0 |
+--------+-----------------------------------------+
| 0 | Exclude ZA |
Expand All @@ -2521,34 +2509,35 @@ enabled by PSTATE.ZA.

* The subroutine behaves as follows:

* If PTR is null, then the subroutine does nothing.

* The subroutine aborts in some platform-specific manner if either of the
following conditions is true:

* the current thread does not have access to SME or PSTATE.ZA is 0.
* The current thread does not have access to SME.

* The current thread does not have access to ``TPIDR2_EL0`` when
PSTATE.ZA is 1.

* the current thread does not have access to TPIDR2_EL0 and bit 2 of
``OPTIONS`` is 0.
* If ``PTR`` does not point to a valid buffer with the required size, the
behaviour of calling this routine is undefined.

* both bit 0 and bit 63 of ``OPTIONS`` are 1.
* For addresses ``PTR->SAVED_ZA``, ``PTR->BLK`` and ``PTR->TPIDR2_EL0``
at unspecified offsets in the buffer pointed to by ``PTR``, if
``PTR->SAVED_ZA`` is 1 and bit 0 of ``OPTIONS`` is 0, then:

* If PSTATE.ZA is 0, then the subroutine enables PSTATE.ZA.
* If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA.

* For addresses ``PTR->BLK``, ``PTR->ZA``, ``PTR->ZT0`` and
``PTR->TPIDR2_EL0`` at unspecified offsets in the buffer pointed to by PTR:
* If ``TPIDR2_EL0`` is a NULL pointer, then the subroutine points X0 to
``PTR->BLK`` and calls ``__arm_tpidr2_restore``.

* If bit 63 of ``OPTIONS`` is 1 and TPIDR2_EL0 is null, then the function
copies ``PTR->BLK`` to X0 and calls ``__arm_tpidr2_restore``.
* The contents of ``PTR->TPIDR2_EL0`` are copied to ``TPIDR2_EL0``.

* If bit 0 of ``OPTIONS`` is 0, then the subroutine restores the entire
contents of ZA from ``PTR->ZA``.
* For addresses ``PTR->SAVED_ZT0`` and ``PTR->ZT0`` at unspecified
offsets in the buffer pointed to by ``PTR``, if ``PTR->SAVED_ZT0`` is 1
and bit 1 of ``OPTIONS`` is 0, then:

* If bit 1 of ``OPTIONS`` is 0 and ZT0 is available, then the subroutine
restores the entire contents of ZT0 from ``PTR->ZT0``.
* If PSTATE.ZA is 0, the subroutine enables PSTATE.ZA.

* If bit 2 of ``OPTIONS`` is 0, the subroutine restores the contents of
TPIDR2_EL0 from ``PTR->TPIDR2_EL0``.
* The full contents of ``PTR->ZT0`` are copied to ZT0.


Pseudo-code examples
Expand Down

0 comments on commit 6f1bba8

Please sign in to comment.