This repository has been archived by the owner on Jul 6, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
lecture14.tex
67 lines (60 loc) · 3.65 KB
/
lecture14.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
\chapter{RISC-V 5-Stage Pipeline/Hazards}
\section{RISC-V Pipeline}
Pipelining doesn't help with the \emph{latency} of a single task (time to travel from source to destination), but rather improves the \emph{throughput} of the entire workload (number of packets processed within a period of time). All pipeline stages have the same duration.
\begin{enumerate}
\item Instruction Fetch: Send address to the instruction memory (IMEM) and read IMEM at that address.
\item Register Decode: Generate control signals from the instruction bits, generate the immediate, and read registers from the RegFile.
\item Execute: Perform ALU operations, and do branch comparison.
\item Memory: Read from or write to the data memory (DMEM).
\item Writeback: Write back either PC + 4, the result of the ALU operation, or data from memory to the RegFile.
\end{enumerate}
\section{Hazards}
\subsection{Structural}
\begin{description}
\item[Problem:] Two or more instructions in the pipeline compete for access to a single physical resource.
\item[Solution 1:] Instructions take turns to use resource, some instructions have to stall.
\item[Solution 2:] Add more hardware to machine (e.g. support simultaneous memory accesses).
\end{description}
\subsection{Data}
\begin{itemize}
\item Register:
\begin{description}
\item[Problem:] What happens if at one pipeline stage a register is being written and read in a single clock cycle by different instructions?
\item[Solution:] You can do a \emph{write} in half a clock cycle and \emph{read} in the next half, so there is no danger after all.
\end{description}
\item ALU Result:
\begin{description}
\item[Problem:] Instruction depends on result from previous instruction.
\item[Solution 1] Stall by filling the instruction pipeline with NOPs until the first instruction's ALU output is ready for the second instruction.
\begin{itemize}
\item Compiler that is aware of the pipeline structure can arrange code in order to avoid hazards and stalls.
\end{itemize}
\item[Solution 2:] Forwarding- Grab operand from pipeline stage rather than register file.
An ALU output is routed straight back into the ALU, with no need for register write.
\begin{enumerate}
\item Detect need for forwarding: Compare \texttt{inst1.rd} and \texttt{inst2.rs1}; if equal, we will do forwarding.
\item ALU output register and register write data are muxed back into ALU input.
\end{enumerate}
\end{description}
\item Load:
\begin{description}
\item[Problem:] We do not have the value of the load in the ALU phase, i.e. we cannot forward backwards in time.
\item[Solution:] Stall one instruction before you can forward.
The slot after a load is called a \emph{load delay slot}. If that instruction uses the result of the load, then the hardware will stall for one cycle.
\begin{itemize}
\item Put unrelated instruction into load delay slot $\implies$ no performance loss.
\end{itemize}
\end{description}
\end{itemize}
\subsection{Control}
\begin{description}
\item[Problem:] Instruction 1 has a branch. The next two instructions are executed regardless of branch outcome.
\item[Solution:] If the branch is not taken, then instructions fetched sequentially after branch are correct. If the branch/jump is taken, then we need to flush (kill) incorrect instructions from pipeline by converting to NOPs.
\end{description}
\section{Superscalar Processors}
How to increase single processor core performance?
\begin{enumerate}
\item Clock rate: Limited by technology and power dissipation
\item Pipeling: "Overlap" instruction execution
\item Multiple issue superscalar: Multiple pipelines; out-of-order execution (reorder instructions dynamically in hardware)
\end{enumerate}