Folien	Chapter	Video
MIPS64	02	Appendix A and K	1.2.1 - 1.2.4
Pipelining & Harzard	03	Appendix C.1	1.3.1 - 1.3.10
Multicycle Operations & Harzard	03	Appendix C.5	1.4.1 - 1.4.2
MIPS R4000	03	Appendix C.6	1.4.3
Scheduling	04	Chapter 3.2	1.4.4 - 1.4.7

MIPS64

Folien 02 / Appendix A and K / Video 1.2.1 - 1.2.4

simple load-store instruction set
- Only load/store instructions access memory
- To process data, need to load it into register first
designed for pipelining efficiency
- Fixed instruction set encoding
efficient instruction set for compiler to target

Registers

has 32 64-bit general-purpose registers (GPRs) R0 - R31
- R0 is a hardwired 0
has 32 64-bit floating-point registers (FPRs) F0 - F31
- to hold a 32-bit single precision floating point or 64-bit double precision floating point
- when used as single precision float, half of FPR is unused
few special registers
- hi and lo:
  - eg. for multiply: n bit * n bit = 2n bit which can not be placed in a n bit register
- floating-point status register
- exception program counter

Data Types

Integer:
- 8-bit: bytes (char)
- 16-bit: half words (short, Unicode)
- 32-bit: words (only in mips)
- 64-bit: double words
  - 32-bit to 64-bit:
    
    to increase the amount of directly addressable memory
Operations process 64-bit integers
- 如果使用bytes等小于64位的integer，需要用zero extended或sign extended转换成64位integer
  - zero extended：
    
    free positions are filled with zeros
  - sign extended：
    
    a sign bit will be copied into the free position
Floating-point
- 32-bit single-precision (float)
- 64-bit double-precision (double)

Addressing Modes

Mode bit specifies:
- Big Endian
- Little Endian
- 图中为32bits整数

Instruction Formats

32 bits long
R type: <Register direct>
I type: <Immediate>
- sign extended
J type:
- 26 bits instruction address，地址是64位
- 怎么从26变成64？
- When a J-type instruction is executed, a full target address is formed by concatenating [1] the high order bits of the PC (the address of the instruction following the jump), the [2] 26 bits of the target field, and [3] 2 zero bits.
- [2] 26 bits of the target field
- [3] 2 zero bits：因为地址/指令地址是对齐的(4Bytes 32bits: 比如地址结尾是 …000 的后一条指令地址是 …100 , 000同时占用了001,010,011 四格) 最后2bits可以省略 (相当于除以4/右移2位)
- [1] high order bits of the PC: 指令只接受64位输入，也就是说28-63位的值由当前PC提供 (即 $(PC+4)_{63..28}$）

MIPS64	长度
整数寄存器	64 bits
浮点数寄存器	64 bits
指令	32 bits
指令opcode	6 bits
寄存器数量	32个 = 5 bits = 2的5次方: 1. 两种寄存器各32个，2.R/I指令的rs/rt/rd部分都是5位长的
地址	64 bits (一个寄存器能存下的长度)
* J指令地址部分	26 bits (32bits的指令-6bits的opcode) 但地址是64bits的， 1. 26位的地址补两个0变成28位(0-27)，因为地址是4byte对齐的所以补两个0没关系 2. 28位的地址加上当前PC提供第28-63位，补充成64位(0-64)
各种数据	8 bits - 64 bits
bytes / Char	8 bits
half words / Short / Unicode	16 bits
words	32 bits
double words	64 bits
single-precision floating-point / Float	32 bits
double-precision floating-point / Double	64 bits
指令操作时处理的数据	64 bits (不足的需要补成64位长度比如8bit的char在操作中会补充成64bit)

Basic Code Optimization

Eliminating redundant branches
Eliminating induction variables
Loop unrolling

Pipelining

Classical 5-Stage Pipeline

Folien 03 / Appendix C.1 / Video 1.3.1 - 1.3.10

IF: Instruction fetch
- Fetch instruction using PC
- PC=PC+4 *(4×8=32 每条指令的长度)
ID: Instruction decode
- decode
- read rs and rt fields in each instruction
- read IMM as sign extended 64bit Integer
- pass destination register rd and the next sequential PC to ID/EX pipeline register
EX: Execute
- ALU
MEM: Memory access
- Load + Store
WB: Write-back cycle
- Write to register

Advantage

all MIPS instructions same length
few instruction formats, source registers always in same place
only loads and stores access memory (MIPS is load/store architecture)

Hazard

structural hazards: HW cannot support parallel execution of instructions
- 1 Mem w/ 1 read/write port
  - IF,MEM can not be in a same Clock Cycle
data hazards: instr. depends on result of prior instr. still in the pipeline
- Reg-Reg的矛盾第一条指令的写Reg在CC5，第二条指令的读Reg在CC3
  - Forwarding / Bypassing
    - Using Pipeline Register (不需要写不需要读第一条ALU计算好的直接进第二条ALU)
    - write in the first half Clock Cycle, read in the second half Clock Cycle
- **load use data hazard ** 在forwarding基础上第一条指令load在MEM结束后读完CC4后半期，第二条指令需要ALU在CC4前半期
  - Pipeline Interlock
    - Bubble：空几个CC
  - Scheduling
control(/branch) hazards: delay between fetching control flow instruction (branch or jump) and actual jump
- 分支指令在ALU比较完决定是否跳转找到跳转地址前有3个CC的空隙(penalty)
- Solution
  - Determine Branch taken / not taken sooner
  - Compute Branch Target address earlier
- MIPS solution
  - Move branch condition evaluation to ID stage
  - Calculate branch target address in ID stage
  - Always execute instruction after the branch (branch delay slot)
- 这样做以后只有1个CC penalty 这个penalty 用一个指令来填满 这个指令一定会被运行(不能对之后有影响)
  - From before branch instruction
  - From target address: only valuable when branch taken
  - From fall through: only valuable when branch not taken
  - 都没有则用nop：dadd r0,r0,r0

Multicycle Operations

Folien 03 / Appendix C.5 / Video 1.4.1 - 1.4.2

Certain instructions (e.g., mul, div, floating-point operations) do not execute in one cycle

for example:

Hazard

Structural Hazard
- Divider not pipelined （Division只能全部做完再开始下一个）
  - stall and scheduling
- Varying latencies: write at the same cycle
  - stall
  - increase write ports
WAW(write-after-write) Hazard
- do not always write back in order
  - busy bit
  - scheduling
instruction complete out of order
RAW(read-after-write) Hazard
- this instruction depends on the instructions before
  - stall
Complexity of forwarding logic increases

MIPS R4000

Folien 03 / Appendix C.6 / Video 1.4.3

Scheduling

Folien 04 / Chapter 3.2 / Video 1.4.4 - 1.4.7

to solve the stall cycles in Multicycle Operations RAW Hazard

Swap the position of the instructions

to use unrelated instruction
- to fill the stall
- to fill the nop after branch instruction
Loop Unrolling
register renaming

moving all loads before stores
moving two SDs after DADDI and BNE
using different registers

Compiler Steps

Eliminate extra test and branch instructions and adjust loop termination and iteration code
Determine that it is possible to move S.D after DADDI and BNE, and find amount to adjust S.D offset
Determine that loop unrolling would be useful by finding that loop iterations were independent
Rename registers to avoid name dependencies
Determine loads and stores in unrolled loop can be interchanged by observing that loads and stores from different iterations are independent
- requires analyzing memory addresses and finding that they do not refer to the same address.
Schedule the code, preserving any dependences needed to yield same result as original code

Advantages and Disadvantages

+improves potential for instruction scheduling

+reduces loop overhead

-increases code size

-shortfall in registers (register pressure)

计算机体系结构-02

MIPS, Pipeline and Instruction Scheduling

目录

MIPS64

Registers

Data Types

Addressing Modes

Instruction Formats

Basic Code Optimization

Pipelining

Classical 5-Stage Pipeline

Advantage

Hazard

Multicycle Operations

Hazard

MIPS R4000

Scheduling

Compiler Steps

Advantages and Disadvantages

CATALOG

RELATED

计算机体系结构

NEW POSTS

目录

MIPS64

Registers

Data Types

Addressing Modes

Instruction Formats

Basic Code Optimization

Pipelining

Classical 5-Stage Pipeline

Advantage

Hazard

Multicycle Operations

Hazard

MIPS R4000

Scheduling

Compiler Steps

Advantages and Disadvantages

CATALOG

RELATED 计算机体系结构

NEW POSTS

RELATED

计算机体系结构