What is “Computer Architecture”?
INSTRUCTION SET ARCHITECTURE Operating System Processor Architecture I/O System Digital Design VLSI Circuit Design Application Compiler Levelsof Abstraction low high
What is “Instruction Set Architecture (ISA)”? Instruction (or Operation Code) Set  Organization of Programmable Storage (main memory etc)  Modes of Addressing and Accessing Data Items and Instructions.  Behaviour on Exceptional Conditions (e.g. hardware divide by 0)
PROCESSOR
MU0 - A Very Simple Processor Arithmetic Logic Unit Program Counter Instruction Register Accumulator MemoryCPU
MU0 DESIGN  Program Counter (PC)  Accumulator (A)  Instruction Register (IR)  Arithmetic Logic Unit (ALU)
MU0 Design  The memory is word-addressible. Each 16-bit word has its own location: word 0, word 1, etc.  The 16-bit instruction code (machine code) has a format:  Top 4 bits define the operation code (opcode)  Bottom 12 bits define the memory address of the data (the operand)  This machine can address up to 212 = 4k words = 8k bytes of data address data 0 0123(16) 1 7777(16)
MU0 Instruction Set Instruction Opcode (hex) Effect LDA S 0000 (0) A := mem[S] STA S 0001 (1) mem[S] := A ADD S 0010 (2) A := A + mem[S] SUB S 0011 (3) A := A – mem[S] JMP S 0100 (4) PC := S JGE S 0101 (5) if A  0, PC := S JNE S 0110 (6) if A  0, PC := S STP 0111 (7) stop  mem[S] – contents of memory location with address S  Think of memory locations as being an array – here S is the array index  A is the single 16 bit CPU register  S is a number from instruction in range 0-4095 (000(16)-FFF(16)) LoaD A Store A ADD to A SUBtract from A JuMP Jump if Gt Equal Jump if Not Equal SToP
Instruction Set Design
4-address instructions Example: ADD d, s1, s2, next_i; d:=s1+s2; Format: 3-address instructions DIY 2-address instructions DIY Function Dest addr Op 1 addr Op 2 addr Next_i addr
Check your answers
 1-address instructions Example: ADD s1; Format:  0-address instructions Example: ADD; top_of_stack := top_of_stack +next_on_stack Format: Function Op 1 Function Op 1 addr Function
LDA 02E ADD 02F STA 030 STP 002E 202F 1030 7??? Machine Code Human readable (mnemonic) assembly code Note – we follow tradition and use Hex notation for addresses and data Instructions execute in sequence Let’s Start with program…..
Contd., 000 LDA 02E 0 02E Assembly mnemonics machine code 001 ADD 02F 2 02F 002 STA 030 1 030 003 STP 7 000 004 -- 005 006 -- -- 02E AA0 AA0 02F 110 110 030 -- -- ... -- -- --  Initially, we assume PC = 0, data and instructions are loaded in memory as shown, other CPU registers are undefined. PC A IR control ALU addrbusdatabus MU0 0 Program Data
Instruction 1: LDA 02E 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 -- -- -- Cycle 1 (fetch instr and increment PC) Cycle 2 (execute instruction) 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 PC A IR control ALU addrbusdatabus MU0 1 002E 1 002E 0AA0 NB – data shown is after each cycle has completed – so PC is one more than PC used to fetch instruction
Instruction 2: ADD 02F 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 -- -- Cycle 1 Cycle 2 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 PC A IR control ALU addrbusdatabus MU0 2 202F 202F 2 0BB0 0AA0 --
Instruction 3: STA 030 PC A IR control ALU addrbusdatabus MU0 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 0BB0 -- -- Cycle 1 Cycle 2 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 3 3 1030 1030 0BB0 0BB0
Instruction 4: STP 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 0BB0 -- -- Cycle 1 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 4 7000 0BB0
LDA mem[100] ADD mem[101] STA mem[102] ADD R0,R1 MOV, R2, R0 ADD R2, R1, R0 1 operand (MU0) a: mem[102] b: mem[101] c: mem[100] 2 operand (AVR) a: R2 b: R1 c: R0 ADD R0,R1 ;R0:=R0+R1 MOV R0,R1 ;R0 := R1 3 operand (ARM) a: R2 b: R1 c: R0 ADD R0,R1,R2 ;R0:=R1+R2 a,b,c stored in memory REGISTORS: have e.g 8 accumulators R0-R7 a,b,c stored in registers a := b+c
Design Strategies
Modern CPU Design  1. Why the move from CISC to RISC?  technology factors increase expense of chip design  better compilers, better software engineers  Simple ISA better for concurrent execution  2. Load / Store architecture  Lots of registers – only go to main memory when really necessary.  3. Concurrent execution of instructions for greater speed  multiple function units (ALUs, etc) – superscalar or VLIW (EPIC) – examples: Pentium & Athlon  “production line” arrangement – pipeline: all modern CPU
Nibbles, Bytes, Words  Internal datapaths inside computers could be different width - for example 4-bit, 8-bit, 16-bit or 32-bit.  For example: ARM processor uses 32-bit internal datapath  WORD = 32-bit for ARM, 16-bit for MU0, 64 bit for latest x86 processors  BYTE (8 bits) and NIBBLE (4 bits) are architecture independent MSB LSB 0781516232431 Word Byte Nibble
Byte addresses for words  Most computer systems now use little-endian byte addressing, in which the least-significant byte has the lower address. … … 7 6 5 4 3 2 1 0 8: 6: 4: 2: 0: Word address MSB Little-endian LSB 16 bit memory with consecutive word addresses separated by 2 4: 3: 2: 1: 0: Word number Not used
0x01234567 will be stored as following.
27 ARM Powered Products
28 The History of ARM • Developed at Acorn Computers Limited, of Cambridge, England, between 1983 and 1985 • Problems with CISC: • Slower then memory parts • Clock cycles per instruction
29 The History of ARM (2) • Solution – the Berkeley RISC I: • Competitive • Easy to develop (less than a year) • Cheap • Pointing the way to the future
Why learn ARM? Dominant architecture for embedded systems 32 bits => powerful & fast Efficient: very low power/MIPS Regular instruction set with many advanced features.
Beyond MU0 - A first look at ARM  Complete instruction set.  Larger address  Subroutine call mechanism  Additional internal registers  Interrupts, direct memory access (DMA), and cache memory.  Interrupts: allow external devices (e.g. mouse, keyboard) to interrupt the current program execution  DMA: allows external high- throughput devices (e.g. display card) to access memory directly rather than through processor  Cache: a small amount of fast memory on the processor
The ARM Instruction Set  Load-Store architecture  Fixed-length (32-bit) instructions  3-operand instruction format (2 source operand regs, 1 result operand reg): ALU operations very powerful (can include shifts)  Conditional execution of ALL instructions (v. clever idea!)  Load-Store multiple registers in one instruction  A single-cycle n-bit shift with ALU operation  “Combines the best of RISC with the best of CISC”
33 Operating Modes User mode – Normal program execution mode – System resources unavailable – Mode can be changed by supervisor only Supervisor modes – Entered upon exception – Full access to system resources – Mode changed freely
ARM Programmer’s Model  16 X 32 bit registers  R15 is equal to the PC  Its value is the current PC value  Writing to it causes a branch!  R0-R14 are general purpose  R13, R14 have additional functions, described later  Current Processor Status Register (CPSR)  Holds condition codes i.e status bits r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (stack pointer) r14 (link register) r15PC C VN Z Iunused modeF T 31 29 7 6 5 4 0CPSR ARM Visible Registers
ARM Programmer's Model (con't)  CPSR is a special register, it cannot be read or written like other registers  The result of any data processing instruction can modify status bits (flags)  These flags are read to determine branch conditions etc  Main status bits (condition codes):  N (result was negative)  Z (result was zero)  C (result involved a carry-out)  V (result overflowed as signed number)  Other fields described later
ARM's memory organization  Byte addressed memory  Maximum 232 bytes of memory  A word = 32-bits, half-word = 16 bits  Words aligned on 4-byte boundaries NB - Lowest byte address = LSB of word “Little-endian” Word addresses follow LSB byte address 20 16 12 8 4 0
37 ARM Instruction Set (3) ARM instruction set Data processing instructions Data transfer instructions Software interrupt instructions Block transfer instructions Multiply instructions Branching instructions
38 Data Processing Instructions • Arithmetic and logical operations • 3-address format: – Two 32-bit operands (op1 is register, op2 is register or immediate) – 32-bit result placed in a register • Barrel shifter for op2 allows full 32-bit shift within instruction cycle
39 Data Processing Instructions (2) • Arithmetic operations: – ADD, ADDC, SUB, SUBC, RSB, RSC • Bit-wise logical operations: – AND, EOR, ORR, BIC • Register movement operations: – MOV, MVN • Comparison operations: – TST, TEQ, CMP, CMN
40 Data Processing Instructions (3) Conditional codes + Data processing instructions + Barrel shifter = Powerful tools for efficient coded programs
41 Data Processing Instructions (4) Example if (z==1) R1=R2+(R3*4) compiles to EQADDS R1,R2,R3, LSL #2 ( SINGLE INSTRUCTION ! )
42 Data Transfer Instructions • Load/store instructions • Used to move signed and unsigned Word, Half Word and Byte to and from registers • Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte
43 Block Transfer Instructions • Load/Store Multiple instructions (LDM/STM) • Whole register bank or a subset copied to memory or restored with single instruction R0 R1 R2 R14 R15 Mi Mi+1 Mi+2 Mi+14 Mi+15 LDM STM
44 Swap Instruction • Exchanges a word between registers • Two cycles but single atomic action • Support for RT semaphores R0 R1 R2 R7 R8 R15
45 Modifying the Status Registers • Only indirectly • MSR moves contents from CPSR/SPSR to selected GPR • MRS moves contents from selected GPR to CPSR/SPSR • Only in privileged modes R0 R1 R7 R8 R14 R15 CPSR SPSR MSR MRS
46 Multiply Instructions • Integer multiplication (32-bit result) • Long integer multiplication (64-bit result) • Built in Multiply Accumulate Unit (MAC) • Multiply and accumulate instructions add product to running total
47 Multiply Instructions • Instructions: MUL Multiply 32-bit result MULA Multiply accumulate 32-bit result UMULL Unsigned multiply 64-bit result UMLAL Unsigned multiply accumulate 64-bit result SMULL Signed multiply 64-bit result SMLAL Signed multiply accumulate 64-bit result
48 Software Interrupt • SWI instruction – Forces CPU into supervisor mode – Usage: SWI #n  Maximum 224 calls  Suitable for running privileged code and making OS calls Cond Opcode Ordinal 31 28 27 24 23 0
49 Branching Instructions • Branch (B): jumps forwards/backwards up to 32 MB • Branch link (BL): same + saves (PC+4) in LR • Suitable for function call/return • Condition codes for conditional branches
50 Branching Instructions (2) • Branch exchange (BX) and Branch link exchange (BLX): same as B/BL + exchange instruction set (ARM  THUMB) • Only way to swap sets
51 Thumb Instruction Set • Compressed form of ARM – Instructions stored as 16-bit, – Decompressed into ARM instructions and – Executed • Lower performance (ARM 40% faster) • Higher density (THUMB saves 30% space) • Optimal – “interworking” (combining two sets) – compiler supported
52 THUMB Instruction Set (2) • More traditional: – No condition codes – Two-address data processing instructions • Access to R0 – R8 restricted to – MOV, ADD, CMP • PUSH/POP for stack manipulation – Descending stack (SP hardwired to R13)
53 THUMB Instruction Set (3) • No MSR and MRS, must change to ARM to modify CPSR (change using BX or BLX) • ARM entered automatically after RESET or entering exception mode • Maximum 255 SWI calls
ARM Assembly Quick Recap MOV ra, rb MOV ra, #n ra := rb ra := n n decimal in range -128 to 127 (other values possible, see later) ADD ra, rb, rc ADD ra, rb, #n ra := rb + rc ra := rb + n SUB => – instead of + CMP ra, rb CMP ra, #n set status bits on ra-rb set status bits on ra-n CMP is like SUB but has no destination register ans sets status bits B label branch to label BL label is branch & link BEQ label BNE label BMI label BPL label branch to label if zero branch if not zero branch if negative branch if zero or plus Branch conditions apply to the result of the last instruction to set status bits (ADDS/SUBS/MOVS/CMP etc). LDR ra, label STR ra, label ADR ra, label LDR ra, [rb] STR ra, [rb] ra := mem[label] mem[label] := ra ra :=address of label ra := mem[rb] mem[rb] := ra LDRB/STRB => byte transfer Other address modes: [rb,#n] => mem[rb+n] [rb,#n]! => mem[rb+n], rb := rb+n [rb],#n => mem[rb], rb:=rb+n [rb+ri] => mem[rb+ri]
MU0 to ARM Operation MU0 ARM A := mem[S] R0 := mem[S] LDA S LDR R0, S mem[S] := A mem[S] := Rn STA S STR R0, S A := A + mem[S] R0 := R0+ mem[S] ADD S LDR R1, S ADD R0, R0, R1 R0 := S n/a MOV R0, #S R0 := R1 + R2 n/a ADD R0, R1, R2 PC := S JMP S B S A R0 R1 R2
Programs for ARM
Introduction to Processor Design and ARM Processor

Introduction to Processor Design and ARM Processor

  • 2.
    What is “ComputerArchitecture”?
  • 3.
    INSTRUCTION SET ARCHITECTURE Operating System ProcessorArchitecture I/O System Digital Design VLSI Circuit Design Application Compiler Levelsof Abstraction low high
  • 4.
    What is “InstructionSet Architecture (ISA)”? Instruction (or Operation Code) Set  Organization of Programmable Storage (main memory etc)  Modes of Addressing and Accessing Data Items and Instructions.  Behaviour on Exceptional Conditions (e.g. hardware divide by 0)
  • 5.
  • 6.
    MU0 - AVery Simple Processor Arithmetic Logic Unit Program Counter Instruction Register Accumulator MemoryCPU
  • 7.
    MU0 DESIGN  ProgramCounter (PC)  Accumulator (A)  Instruction Register (IR)  Arithmetic Logic Unit (ALU)
  • 8.
    MU0 Design  Thememory is word-addressible. Each 16-bit word has its own location: word 0, word 1, etc.  The 16-bit instruction code (machine code) has a format:  Top 4 bits define the operation code (opcode)  Bottom 12 bits define the memory address of the data (the operand)  This machine can address up to 212 = 4k words = 8k bytes of data address data 0 0123(16) 1 7777(16)
  • 9.
    MU0 Instruction Set InstructionOpcode (hex) Effect LDA S 0000 (0) A := mem[S] STA S 0001 (1) mem[S] := A ADD S 0010 (2) A := A + mem[S] SUB S 0011 (3) A := A – mem[S] JMP S 0100 (4) PC := S JGE S 0101 (5) if A  0, PC := S JNE S 0110 (6) if A  0, PC := S STP 0111 (7) stop  mem[S] – contents of memory location with address S  Think of memory locations as being an array – here S is the array index  A is the single 16 bit CPU register  S is a number from instruction in range 0-4095 (000(16)-FFF(16)) LoaD A Store A ADD to A SUBtract from A JuMP Jump if Gt Equal Jump if Not Equal SToP
  • 10.
  • 11.
    4-address instructions Example: ADDd, s1, s2, next_i; d:=s1+s2; Format: 3-address instructions DIY 2-address instructions DIY Function Dest addr Op 1 addr Op 2 addr Next_i addr
  • 12.
  • 13.
     1-address instructions Example:ADD s1; Format:  0-address instructions Example: ADD; top_of_stack := top_of_stack +next_on_stack Format: Function Op 1 Function Op 1 addr Function
  • 14.
    LDA 02E ADD 02F STA030 STP 002E 202F 1030 7??? Machine Code Human readable (mnemonic) assembly code Note – we follow tradition and use Hex notation for addresses and data Instructions execute in sequence Let’s Start with program…..
  • 15.
    Contd., 000 LDA 02E0 02E Assembly mnemonics machine code 001 ADD 02F 2 02F 002 STA 030 1 030 003 STP 7 000 004 -- 005 006 -- -- 02E AA0 AA0 02F 110 110 030 -- -- ... -- -- --  Initially, we assume PC = 0, data and instructions are loaded in memory as shown, other CPU registers are undefined. PC A IR control ALU addrbusdatabus MU0 0 Program Data
  • 16.
    Instruction 1: LDA02E 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 -- -- -- Cycle 1 (fetch instr and increment PC) Cycle 2 (execute instruction) 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 PC A IR control ALU addrbusdatabus MU0 1 002E 1 002E 0AA0 NB – data shown is after each cycle has completed – so PC is one more than PC used to fetch instruction
  • 17.
    Instruction 2: ADD02F 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 -- -- Cycle 1 Cycle 2 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 PC A IR control ALU addrbusdatabus MU0 2 202F 202F 2 0BB0 0AA0 --
  • 18.
    Instruction 3: STA030 PC A IR control ALU addrbusdatabus MU0 0 02E machine code 2 02F 1 030 7 000 -- 0AA0 0110 0BB0 -- -- Cycle 1 Cycle 2 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 3 3 1030 1030 0BB0 0BB0
  • 19.
    Instruction 4: STP 002E machine code 2 02F 1 030 7 000 -- 0AA0 0110 0BB0 -- -- Cycle 1 000 001 002 003 004 005 006 02E 02F 030 ... PC A IR control ALU addrbusdatabus MU0 4 7000 0BB0
  • 20.
    LDA mem[100] ADD mem[101] STAmem[102] ADD R0,R1 MOV, R2, R0 ADD R2, R1, R0 1 operand (MU0) a: mem[102] b: mem[101] c: mem[100] 2 operand (AVR) a: R2 b: R1 c: R0 ADD R0,R1 ;R0:=R0+R1 MOV R0,R1 ;R0 := R1 3 operand (ARM) a: R2 b: R1 c: R0 ADD R0,R1,R2 ;R0:=R1+R2 a,b,c stored in memory REGISTORS: have e.g 8 accumulators R0-R7 a,b,c stored in registers a := b+c
  • 21.
  • 22.
    Modern CPU Design 1. Why the move from CISC to RISC?  technology factors increase expense of chip design  better compilers, better software engineers  Simple ISA better for concurrent execution  2. Load / Store architecture  Lots of registers – only go to main memory when really necessary.  3. Concurrent execution of instructions for greater speed  multiple function units (ALUs, etc) – superscalar or VLIW (EPIC) – examples: Pentium & Athlon  “production line” arrangement – pipeline: all modern CPU
  • 23.
    Nibbles, Bytes, Words Internal datapaths inside computers could be different width - for example 4-bit, 8-bit, 16-bit or 32-bit.  For example: ARM processor uses 32-bit internal datapath  WORD = 32-bit for ARM, 16-bit for MU0, 64 bit for latest x86 processors  BYTE (8 bits) and NIBBLE (4 bits) are architecture independent MSB LSB 0781516232431 Word Byte Nibble
  • 24.
    Byte addresses forwords  Most computer systems now use little-endian byte addressing, in which the least-significant byte has the lower address. … … 7 6 5 4 3 2 1 0 8: 6: 4: 2: 0: Word address MSB Little-endian LSB 16 bit memory with consecutive word addresses separated by 2 4: 3: 2: 1: 0: Word number Not used
  • 25.
    0x01234567 will bestored as following.
  • 27.
  • 28.
    28 The History ofARM • Developed at Acorn Computers Limited, of Cambridge, England, between 1983 and 1985 • Problems with CISC: • Slower then memory parts • Clock cycles per instruction
  • 29.
    29 The History ofARM (2) • Solution – the Berkeley RISC I: • Competitive • Easy to develop (less than a year) • Cheap • Pointing the way to the future
  • 30.
    Why learn ARM? Dominantarchitecture for embedded systems 32 bits => powerful & fast Efficient: very low power/MIPS Regular instruction set with many advanced features.
  • 31.
    Beyond MU0 -A first look at ARM  Complete instruction set.  Larger address  Subroutine call mechanism  Additional internal registers  Interrupts, direct memory access (DMA), and cache memory.  Interrupts: allow external devices (e.g. mouse, keyboard) to interrupt the current program execution  DMA: allows external high- throughput devices (e.g. display card) to access memory directly rather than through processor  Cache: a small amount of fast memory on the processor
  • 32.
    The ARM InstructionSet  Load-Store architecture  Fixed-length (32-bit) instructions  3-operand instruction format (2 source operand regs, 1 result operand reg): ALU operations very powerful (can include shifts)  Conditional execution of ALL instructions (v. clever idea!)  Load-Store multiple registers in one instruction  A single-cycle n-bit shift with ALU operation  “Combines the best of RISC with the best of CISC”
  • 33.
    33 Operating Modes User mode –Normal program execution mode – System resources unavailable – Mode can be changed by supervisor only Supervisor modes – Entered upon exception – Full access to system resources – Mode changed freely
  • 34.
    ARM Programmer’s Model 16 X 32 bit registers  R15 is equal to the PC  Its value is the current PC value  Writing to it causes a branch!  R0-R14 are general purpose  R13, R14 have additional functions, described later  Current Processor Status Register (CPSR)  Holds condition codes i.e status bits r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (stack pointer) r14 (link register) r15PC C VN Z Iunused modeF T 31 29 7 6 5 4 0CPSR ARM Visible Registers
  • 35.
    ARM Programmer's Model(con't)  CPSR is a special register, it cannot be read or written like other registers  The result of any data processing instruction can modify status bits (flags)  These flags are read to determine branch conditions etc  Main status bits (condition codes):  N (result was negative)  Z (result was zero)  C (result involved a carry-out)  V (result overflowed as signed number)  Other fields described later
  • 36.
    ARM's memory organization Byte addressed memory  Maximum 232 bytes of memory  A word = 32-bits, half-word = 16 bits  Words aligned on 4-byte boundaries NB - Lowest byte address = LSB of word “Little-endian” Word addresses follow LSB byte address 20 16 12 8 4 0
  • 37.
    37 ARM Instruction Set(3) ARM instruction set Data processing instructions Data transfer instructions Software interrupt instructions Block transfer instructions Multiply instructions Branching instructions
  • 38.
    38 Data Processing Instructions •Arithmetic and logical operations • 3-address format: – Two 32-bit operands (op1 is register, op2 is register or immediate) – 32-bit result placed in a register • Barrel shifter for op2 allows full 32-bit shift within instruction cycle
  • 39.
    39 Data Processing Instructions(2) • Arithmetic operations: – ADD, ADDC, SUB, SUBC, RSB, RSC • Bit-wise logical operations: – AND, EOR, ORR, BIC • Register movement operations: – MOV, MVN • Comparison operations: – TST, TEQ, CMP, CMN
  • 40.
    40 Data Processing Instructions(3) Conditional codes + Data processing instructions + Barrel shifter = Powerful tools for efficient coded programs
  • 41.
    41 Data Processing Instructions(4) Example if (z==1) R1=R2+(R3*4) compiles to EQADDS R1,R2,R3, LSL #2 ( SINGLE INSTRUCTION ! )
  • 42.
    42 Data Transfer Instructions •Load/store instructions • Used to move signed and unsigned Word, Half Word and Byte to and from registers • Can be used to load PC (if target address is beyond branch instruction range) LDR Load Word STR Store Word LDRH Load Half Word STRH Store Half Word LDRSH Load Signed Half Word STRSH Store Signed Half Word LDRB Load Byte STRB Store Byte LDRSB Load Signed Byte STRSB Store Signed Byte
  • 43.
    43 Block Transfer Instructions •Load/Store Multiple instructions (LDM/STM) • Whole register bank or a subset copied to memory or restored with single instruction R0 R1 R2 R14 R15 Mi Mi+1 Mi+2 Mi+14 Mi+15 LDM STM
  • 44.
    44 Swap Instruction • Exchangesa word between registers • Two cycles but single atomic action • Support for RT semaphores R0 R1 R2 R7 R8 R15
  • 45.
    45 Modifying the StatusRegisters • Only indirectly • MSR moves contents from CPSR/SPSR to selected GPR • MRS moves contents from selected GPR to CPSR/SPSR • Only in privileged modes R0 R1 R7 R8 R14 R15 CPSR SPSR MSR MRS
  • 46.
    46 Multiply Instructions • Integermultiplication (32-bit result) • Long integer multiplication (64-bit result) • Built in Multiply Accumulate Unit (MAC) • Multiply and accumulate instructions add product to running total
  • 47.
    47 Multiply Instructions • Instructions: MULMultiply 32-bit result MULA Multiply accumulate 32-bit result UMULL Unsigned multiply 64-bit result UMLAL Unsigned multiply accumulate 64-bit result SMULL Signed multiply 64-bit result SMLAL Signed multiply accumulate 64-bit result
  • 48.
    48 Software Interrupt • SWIinstruction – Forces CPU into supervisor mode – Usage: SWI #n  Maximum 224 calls  Suitable for running privileged code and making OS calls Cond Opcode Ordinal 31 28 27 24 23 0
  • 49.
    49 Branching Instructions • Branch(B): jumps forwards/backwards up to 32 MB • Branch link (BL): same + saves (PC+4) in LR • Suitable for function call/return • Condition codes for conditional branches
  • 50.
    50 Branching Instructions (2) •Branch exchange (BX) and Branch link exchange (BLX): same as B/BL + exchange instruction set (ARM  THUMB) • Only way to swap sets
  • 51.
    51 Thumb Instruction Set •Compressed form of ARM – Instructions stored as 16-bit, – Decompressed into ARM instructions and – Executed • Lower performance (ARM 40% faster) • Higher density (THUMB saves 30% space) • Optimal – “interworking” (combining two sets) – compiler supported
  • 52.
    52 THUMB Instruction Set(2) • More traditional: – No condition codes – Two-address data processing instructions • Access to R0 – R8 restricted to – MOV, ADD, CMP • PUSH/POP for stack manipulation – Descending stack (SP hardwired to R13)
  • 53.
    53 THUMB Instruction Set(3) • No MSR and MRS, must change to ARM to modify CPSR (change using BX or BLX) • ARM entered automatically after RESET or entering exception mode • Maximum 255 SWI calls
  • 54.
    ARM Assembly QuickRecap MOV ra, rb MOV ra, #n ra := rb ra := n n decimal in range -128 to 127 (other values possible, see later) ADD ra, rb, rc ADD ra, rb, #n ra := rb + rc ra := rb + n SUB => – instead of + CMP ra, rb CMP ra, #n set status bits on ra-rb set status bits on ra-n CMP is like SUB but has no destination register ans sets status bits B label branch to label BL label is branch & link BEQ label BNE label BMI label BPL label branch to label if zero branch if not zero branch if negative branch if zero or plus Branch conditions apply to the result of the last instruction to set status bits (ADDS/SUBS/MOVS/CMP etc). LDR ra, label STR ra, label ADR ra, label LDR ra, [rb] STR ra, [rb] ra := mem[label] mem[label] := ra ra :=address of label ra := mem[rb] mem[rb] := ra LDRB/STRB => byte transfer Other address modes: [rb,#n] => mem[rb+n] [rb,#n]! => mem[rb+n], rb := rb+n [rb],#n => mem[rb], rb:=rb+n [rb+ri] => mem[rb+ri]
  • 55.
    MU0 to ARM OperationMU0 ARM A := mem[S] R0 := mem[S] LDA S LDR R0, S mem[S] := A mem[S] := Rn STA S STR R0, S A := A + mem[S] R0 := R0+ mem[S] ADD S LDR R1, S ADD R0, R0, R1 R0 := S n/a MOV R0, #S R0 := R1 + R2 n/a ADD R0, R1, R2 PC := S JMP S B S A R0 R1 R2
  • 56.