Skip to content
This repository has been archived by the owner on Aug 12, 2022. It is now read-only.

How to deal with wrong target address prediction of link stack #41

Open
Grubby-CPU opened this issue May 13, 2021 · 1 comment
Open

Comments

@Grubby-CPU
Copy link

Hi Guys,

Based on my understanding, the link stack can give a wrong prediction of the branch target address. How does A2 detect this case and flush the pipeline. I did not find any related logic in the iuq_bp.vhdl.

Many thanks

@openpowerwtf
Copy link
Collaborator

In what cases do you think it gives the wrong prediction? Stack overflow? Other cases where stack is messed up relative to stream? There would have to be a 'final arbiter' later in the pipe to validate the target address.

Looking around iuq_bp.vhdl....seems like this is the address creation and the valid...

iu5_redirect_ifar_d(EFF_IFAR'left to 61)        <= iu4_lnk(EFF_IFAR'left to 61) when iu4_bclr = '1' else
                                                   iu4_bta(EFF_IFAR'left to 61);

iu5_redirect_tid_d(0 to 3)                      <= iu4_redirect_tid(0 to 3) and not iu4_flush_tid(0 to 3);

-- came from this...

iu3_br_pred(0 to 3)             <= iu3_br_val(0 to 3) and
                                   (iu3_br_hard(0 to 3) or
                                   (iu3_hint_val(0 to 3) and iu3_hint(0 to 3)) or
                                   (iu3_br_dynamic(0 to 3) and iu3_br_hist0(0 to 3)) or
                                   (iu3_br_static(0 to 3)));

-- which depends on 'predecode bits'

From Manual [2.9 Branch Processing] - good info; appears that XU does the final check on target address...

Branch Conditional to Link Register
Incoming BCLR instructions obtain a BTA from the branch predictor's link stack. The link stack is a LIFO
buffer designed to keep track of nested subroutines. It holds a list of potential LINK register contents, which
are maintained based on subroutine calls and returns. A subroutine call is defined as any taken branch where
instruction field LK = '1'. When a subroutine call is detected, the NIA (incremented IFAR) is pushed onto the
top of the link stack because this is the location to which the subroutine will return. A subroutine return is
defined as a taken branch conditional to LR (BCLR) where instruction field BH = ‘00’ (while this is kept as a
condition for a subroutine return, it is generally assumed that all BCLR instructions are intended as subroutine
returns). When a subroutine return is detected, a previously stored NIA is popped off the top of the link stack,
and used as a BTA for the current BCLR instruction. In the event of nested subroutines, multiple consecutive
calls are followed by multiple consecutive returns, with the LIFO structure of the link stack keeping everything
ordered properly. The link stack is isolated and replicated per thread to maintain proper instruction flow in and
out of the buffer. Each stack is four entries deep, and wide enough to accommodate the entire IFAR (poten-
tially 62 bits). A pointer is used to define the top of the stack.

Misalignment
In the event of a stack misalignment, the stack must be realigned. Misalignment occurs when the branch
direction for a subroutine call/return is predicted incorrectly and the stack pointer is consequently moved to
the wrong location. Realignment of the stack pointer relies on the use of a shadow pointer. The shadow
pointer is governed by the same rules as the stack pointer, except that it acts on resolved branches instead of
predicted branches. This guarantees that the value of the shadow pointer is always correct (even though the
data is too old to be useful to the branch predictor under normal circumstances). Any time the execution unit
flushes (whether due to a branch misprediction or not), the stack pointer is overwritten with the value of the
shadow pointer. The shadow pointer becomes valid for predictions at this point because all branch instruc-
tions that have not yet been resolved by the execution unit will be flushed with the rest of the pipeline. In the
special case that a subroutine call was predicted not taken, then resolved taken, simple realignment is not
sufficient. The top of the realigned stack must also be updated with the subroutine call's NIA.

Overflow
Because the link stack is only four entries deep, the logic can only handle four nested subroutines before
overflowing. In the case of an overflow, the stack pointer wraps and continue storing NIAs, overwriting
existing data in the oldest locations. In this way, the link stack is always able to return BTAs for the four most
recent nested subroutine calls. If the nesting has gone deeper than this, the link stack returns garbage for
anything less recent. This is unavoidable. A deeper stack could reduce the impact at the expense of area.

Corruption
It should be noted that there is a danger of BTA corruption in the case of BCLR instructions, due to either
stack misalignment or overflow conditions. The XU must compare the predicted BTA against the executed
BTA for all BCLRs and flag a misprediction if they fail to match.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants