ARM Zero To Hero : Level IBy : Mohammed Gomaa
ARM programming Model ARM Cortex-M4F
Road Map • Cortex-M4 Memory Map – Cortex-M4 Memory Map – Bit-band Operations – Cortex-M4 Program Image and Endianness • ARM Cortex-M4 Processor Instruction Set – ARM and Thumb Instruction Set – Cortex-M4 Instruction Set
Cortex-M4 Memory Map • The Cortex-M4 processor has 4 GB of memory address space– Support for bit-band operation (detailed later) • The 4GB memory space is architecturally defined as a num-ber of regions – Each region is given for recommended usage – Easy for software programmer to port between differentdevices Nevertheless, despite of the default memory map, the actualusage of the memory map can also be flexibly defined by the user, except some fixed memory addresses, such as internal private peripheral bus
M4 Memory Map (cont.) • Code Region – Primarily used to store program code – Can also be used for data memory – On-chip memory, such as on-chip FLASH • SRAM Region – Primarily used to store data, such as heaps and stacks – Can also be used for program code – On-chip memory; despite its name “SRAM”, the actualdevice could be SRAM, SDRAM or other types • Peripheral Region – Primarily used for peripherals, such as Advanced High-performance Bus (AHB) or Advanced Peripheral Bus (APB) peripherals
• External RAM Region – Primarily used to store large data blocks, or memorycaches – Off-chip memory, slower than on-chip SRAM region • External Device Region – Primarily used to map to external devices – Off-chip devices, such as SD card • Internal Private Peripheral Bus (PPB) – Used inside the processor core for internal control – Within PPB, a special range of memory is defined as Sys-tem Control Space (SCS) – The Nested Vectored Interrupt Controller (NVIC) is part of SCS
Cortex-M4 Memory Map Example AHB bus External SRAM, FLASH External LCD SD card Cortex-M4 PPB SCS NVIC Debug Ctrl On-chip FLASH (Code Region) On-chip SRAM ( )SRAM Region Peripheral Region External memory interface )External RAM Region( External device interface ( )External Device Region Timer UART GPIO Chip Silicon
Bit-band Operations • Bit-band operation allows a single load/store operation toaccess a single bit in the memory, for example, to change a single bit of one 32-bit data: – Normal operation without bit-band (read-modify-write) – Read the value of 32-bit data – Modify a single bit of the 32-bit value (keep other bitsunchanged) – Write the value back to the address – Bit-band operation – Directly write a single bit (0 or 1) to the “bit-band aliasaddress” of the data • Bit-band alias address – Each bit-band alias address is mapped to a real dataaddress – When writing to the bit-band alias address, only a singlebit of the data will be changed
Bit-band Operation Example For example, in order to set bit[3] in word data in address0x20000000 ;Read-Modify-Write Operation ;Bit-band Operation LDR R1, =0x20000000 ;Setup address LDR R1, =0x2200000C ;Setup address LDR R0, [R1] ;Read MOV R0, #1 ;Load data ORR.W R0, #0x8 ;Modify bit STR R0, [R1] ;Write STR R0, [R1]
• Read-Modify-Write operation – Read the real data address (0x20000000) – Modify the desired bit (retain other bits unchanged) – Write the modified data back • Bit-band operation – Directly set the bit by writing ‘1’ to address 0x2200000C,which is the alias address of the fourth bit of the 32-bit data at 0x20000000 – In effect, this single instruction is mapped to 2 bus trans-fers: read data from 0x20000000 to the buffer, and then write to 0x20000000 from the buffer with bit [3] set
Bit-band Alias Address Each bit of the 32-bit data is one-to-one mapped to the bit-band alias address – For example, the fourth bit (bit [3]) of the data at0x20000000 is mapped to the bit-band alias address at 0x2200000C – Hence, to set bit [3] of the data at 0x20000000, we only need to write ‘1’ to address 0x2200000C – In Cortex-M4, there are two pre-defined bit-band alias regions: one for SRAM region, and one for peripherals region
Bit-band Alias Address (cont.) • SRAM region – 32MB memory space (0x22000000 – 0x23FFFFFF) is used as the bit-band alias region for 1MB data (0x20000000 – 0x200FFFFF) • Peripherals region – 32MB memory space (0x42000000 – 0x43FFFFFF) is used as the bit-band alias region for 1MB data (0x40000000 – 0x400FFFFF)
Benefits of Bit-Band Operations • Faster bit operations • Fewer instructions • Atomic operation, avoid hazards – For example, if an interrupt is triggered and served during the Read-Modify-Write operations, and the interrupt service routine modifies the same data, a data conflict will occur Interrupt occurs Read data at 0x00 Modify bit [1] Read data at0x00 Modify bit [1] Write data back Write data back Interrupt returns Bit [1] modifiedby ISR is overwritten by the main program Main program
Cortex-M4 Program Image • The program image in Cortex-M4 contains – Vector table -- includes the starting addresses of exceptions –(vectors) and the value of the main stack point (MSP) – C start-up routine; – Program code – application code and data; – C library code – program codes for C library functions
Cortex-M4 Program Image (cont) • After Reset, the processor: – First reads the initial MSP value; – Then reads the reset vector; – Branches to the start of the programme execution address(reset handler); – Subsequently executes program instructions Reset Fetch initial value for MSP 0x00000000)(Read address Fetch reset vector (Read address 0x00000004) Fetch 1 st instruction (Read address of reset vector ) Fetch 2 nd instruction (Read subsequent instructions )
Cortex-M4 Endianness • Endian refers to the order of bytes stored in memory – Little endian: lowest byte of a word-size data is stored inbit 0 to bit 7 – Big endian: lowest byte of a word-size data is stored in bit24 to bit 31 • Cortex-M4 supports both little endian and big endian • However, Endianness only exists in the hardware level
Byte3 Byte2 Byte1 Byte0 Word 3 Byte3 Byte2 Byte1 Byte0 Word 2 Byte3 Byte2 Byte1 Byte0 Little [31:24] Word 1 endian 32-bit memory [23:16] [15:8] [7:0] Byte0 Byte1 Byte2 Byte3 Word 3 Byte0 Byte1 Byte2 Byte3 Word 2 Byte0 Byte1 Byte2 Byte3 Address [31:24] [23:16] [15:8] [7:0]
ARM and Thumb® Instruction Set • Early ARM instruction set – 32-bit instruction set, called the ARM instructions – Powerful and good performance – Larger program memory compared to 8-bit and 16-bit pro-cessors – Larger power consumption • Thumb-1 instruction set – 16-bit instruction set, first used in ARM7TDMI processorin 1995 – Provides a subset of the ARM instructions, giving bettercode density compared to 32-bit RISC architecture – Code size is reduced by ~30%, but performance is alsoreduced by ~20%
• Mix of ARM and Thumb-1 Instruction sets – Benefit from both 32-bit ARM (high performance) and 16-bit Thumb-1 (high code density) – A multiplexer is used to switch between two states: ARM state (32-bit) and Thumb state (16-bit), which requires a switching overhead Instructions Executing Incoming Instructions
Thumb-2 instruction set • Consists of both 32-bit Thumb instructions and original 16-bit Thumb-1 instruction sets • Compared to 32-bit ARM instructions set, code size is reduced by ~26%, while keeping a similar performance • Capable of handling all processing requirements in one operation state
Cortex-M4 Instruction Set • Cortex-M4 processor – ARMv7-M architecture – Supports 32-bit Thumb-2 instructions – Possible to handle all processing requirements in one oper-ation state (Thumb state) – Compared with traditional ARM processors (use stateswitching), advantages include: * No state switching overhead – both execution time and instruc-tion space are saved * No need to separate ARM code and Thumb code source files,which makes the development and maintenance of software easier * Easier to get optimized efficiency and performance
• ARM assembly syntax: label mnemonic operand1,operand2, …; Comments – Label is used as a reference to an address location; – Mnemonic is the name of the instruction; – Operand1 is the destination of the operation; – Operand2 is normally the source of the operation; – Comments are written after “ ; ”, which does not affect theprogram; – For example MOVSR3, #0x11;Set register R3 to 0x11 – Note that the assembly code can be assembled by eitherARM assembler (armasm) or assembly tools from a variety of vendors (e.g. GNU tool chain). When using GNU tool chain, the syntax for labels and comments is slightly different Cortex-M4 Instruction Set (cont.)
Cortex-M4 Suffix – Some instructions can be followed by suffixes to update processor flags or execute the instruction on a certain condition
C Calling Assembly For all applications the most common scenario involving assembly code writing, if needed at all, will be C calling assembly. In simple terms the rules are: Formally, the ARM Architecture Procedure Call Standard (AAPCS) defines: – Which registers must be saved and restored – How to call procedures – How to return from procedures
AAPCS Register Use Conventions • Make it easier to create modular, isolated and integrated code • Scratch registers are not expected to be preserved upon returning from a called subroutine – This applies to r0–r3 • Preserved (“variable”) registers are expected to have their original values upon returning from a called subroutine – This applies to r4–r8, r10–r11 Use PUSH {r4,..} and POP {r4,...}
AAPCS Core Register Use
Arm cortex-m4 programmer model

Arm cortex-m4 programmer model

  • 1.
    ARM Zero ToHero : Level IBy : Mohammed Gomaa
  • 2.
  • 3.
    Road Map • Cortex-M4Memory Map – Cortex-M4 Memory Map – Bit-band Operations – Cortex-M4 Program Image and Endianness • ARM Cortex-M4 Processor Instruction Set – ARM and Thumb Instruction Set – Cortex-M4 Instruction Set
  • 4.
    Cortex-M4 Memory Map •The Cortex-M4 processor has 4 GB of memory address space– Support for bit-band operation (detailed later) • The 4GB memory space is architecturally defined as a num-ber of regions – Each region is given for recommended usage – Easy for software programmer to port between differentdevices Nevertheless, despite of the default memory map, the actualusage of the memory map can also be flexibly defined by the user, except some fixed memory addresses, such as internal private peripheral bus
  • 6.
    M4 Memory Map(cont.) • Code Region – Primarily used to store program code – Can also be used for data memory – On-chip memory, such as on-chip FLASH • SRAM Region – Primarily used to store data, such as heaps and stacks – Can also be used for program code – On-chip memory; despite its name “SRAM”, the actualdevice could be SRAM, SDRAM or other types • Peripheral Region – Primarily used for peripherals, such as Advanced High-performance Bus (AHB) or Advanced Peripheral Bus (APB) peripherals
  • 7.
    • External RAMRegion – Primarily used to store large data blocks, or memorycaches – Off-chip memory, slower than on-chip SRAM region • External Device Region – Primarily used to map to external devices – Off-chip devices, such as SD card • Internal Private Peripheral Bus (PPB) – Used inside the processor core for internal control – Within PPB, a special range of memory is defined as Sys-tem Control Space (SCS) – The Nested Vectored Interrupt Controller (NVIC) is part of SCS
  • 8.
    Cortex-M4 Memory MapExample AHB bus External SRAM, FLASH External LCD SD card Cortex-M4 PPB SCS NVIC Debug Ctrl On-chip FLASH (Code Region) On-chip SRAM ( )SRAM Region Peripheral Region External memory interface )External RAM Region( External device interface ( )External Device Region Timer UART GPIO Chip Silicon
  • 9.
    Bit-band Operations • Bit-bandoperation allows a single load/store operation toaccess a single bit in the memory, for example, to change a single bit of one 32-bit data: – Normal operation without bit-band (read-modify-write) – Read the value of 32-bit data – Modify a single bit of the 32-bit value (keep other bitsunchanged) – Write the value back to the address – Bit-band operation – Directly write a single bit (0 or 1) to the “bit-band aliasaddress” of the data • Bit-band alias address – Each bit-band alias address is mapped to a real dataaddress – When writing to the bit-band alias address, only a singlebit of the data will be changed
  • 10.
    Bit-band Operation Example Forexample, in order to set bit[3] in word data in address0x20000000 ;Read-Modify-Write Operation ;Bit-band Operation LDR R1, =0x20000000 ;Setup address LDR R1, =0x2200000C ;Setup address LDR R0, [R1] ;Read MOV R0, #1 ;Load data ORR.W R0, #0x8 ;Modify bit STR R0, [R1] ;Write STR R0, [R1]
  • 11.
    • Read-Modify-Write operation –Read the real data address (0x20000000) – Modify the desired bit (retain other bits unchanged) – Write the modified data back • Bit-band operation – Directly set the bit by writing ‘1’ to address 0x2200000C,which is the alias address of the fourth bit of the 32-bit data at 0x20000000 – In effect, this single instruction is mapped to 2 bus trans-fers: read data from 0x20000000 to the buffer, and then write to 0x20000000 from the buffer with bit [3] set
  • 12.
    Bit-band Alias Address Eachbit of the 32-bit data is one-to-one mapped to the bit-band alias address – For example, the fourth bit (bit [3]) of the data at0x20000000 is mapped to the bit-band alias address at 0x2200000C – Hence, to set bit [3] of the data at 0x20000000, we only need to write ‘1’ to address 0x2200000C – In Cortex-M4, there are two pre-defined bit-band alias regions: one for SRAM region, and one for peripherals region
  • 13.
    Bit-band Alias Address(cont.) • SRAM region – 32MB memory space (0x22000000 – 0x23FFFFFF) is used as the bit-band alias region for 1MB data (0x20000000 – 0x200FFFFF) • Peripherals region – 32MB memory space (0x42000000 – 0x43FFFFFF) is used as the bit-band alias region for 1MB data (0x40000000 – 0x400FFFFF)
  • 14.
    Benefits of Bit-BandOperations • Faster bit operations • Fewer instructions • Atomic operation, avoid hazards – For example, if an interrupt is triggered and served during the Read-Modify-Write operations, and the interrupt service routine modifies the same data, a data conflict will occur Interrupt occurs Read data at 0x00 Modify bit [1] Read data at0x00 Modify bit [1] Write data back Write data back Interrupt returns Bit [1] modifiedby ISR is overwritten by the main program Main program
  • 15.
    Cortex-M4 Program Image •The program image in Cortex-M4 contains – Vector table -- includes the starting addresses of exceptions –(vectors) and the value of the main stack point (MSP) – C start-up routine; – Program code – application code and data; – C library code – program codes for C library functions
  • 16.
    Cortex-M4 Program Image(cont) • After Reset, the processor: – First reads the initial MSP value; – Then reads the reset vector; – Branches to the start of the programme execution address(reset handler); – Subsequently executes program instructions Reset Fetch initial value for MSP 0x00000000)(Read address Fetch reset vector (Read address 0x00000004) Fetch 1 st instruction (Read address of reset vector ) Fetch 2 nd instruction (Read subsequent instructions )
  • 17.
    Cortex-M4 Endianness • Endianrefers to the order of bytes stored in memory – Little endian: lowest byte of a word-size data is stored inbit 0 to bit 7 – Big endian: lowest byte of a word-size data is stored in bit24 to bit 31 • Cortex-M4 supports both little endian and big endian • However, Endianness only exists in the hardware level
  • 18.
    Byte3 Byte2 Byte1Byte0 Word 3 Byte3 Byte2 Byte1 Byte0 Word 2 Byte3 Byte2 Byte1 Byte0 Little [31:24] Word 1 endian 32-bit memory [23:16] [15:8] [7:0] Byte0 Byte1 Byte2 Byte3 Word 3 Byte0 Byte1 Byte2 Byte3 Word 2 Byte0 Byte1 Byte2 Byte3 Address [31:24] [23:16] [15:8] [7:0]
  • 19.
    ARM and Thumb®Instruction Set • Early ARM instruction set – 32-bit instruction set, called the ARM instructions – Powerful and good performance – Larger program memory compared to 8-bit and 16-bit pro-cessors – Larger power consumption • Thumb-1 instruction set – 16-bit instruction set, first used in ARM7TDMI processorin 1995 – Provides a subset of the ARM instructions, giving bettercode density compared to 32-bit RISC architecture – Code size is reduced by ~30%, but performance is alsoreduced by ~20%
  • 20.
    • Mix ofARM and Thumb-1 Instruction sets – Benefit from both 32-bit ARM (high performance) and 16-bit Thumb-1 (high code density) – A multiplexer is used to switch between two states: ARM state (32-bit) and Thumb state (16-bit), which requires a switching overhead Instructions Executing Incoming Instructions
  • 21.
    Thumb-2 instruction set •Consists of both 32-bit Thumb instructions and original 16-bit Thumb-1 instruction sets • Compared to 32-bit ARM instructions set, code size is reduced by ~26%, while keeping a similar performance • Capable of handling all processing requirements in one operation state
  • 22.
    Cortex-M4 Instruction Set •Cortex-M4 processor – ARMv7-M architecture – Supports 32-bit Thumb-2 instructions – Possible to handle all processing requirements in one oper-ation state (Thumb state) – Compared with traditional ARM processors (use stateswitching), advantages include: * No state switching overhead – both execution time and instruc-tion space are saved * No need to separate ARM code and Thumb code source files,which makes the development and maintenance of software easier * Easier to get optimized efficiency and performance
  • 23.
    • ARM assemblysyntax: label mnemonic operand1,operand2, …; Comments – Label is used as a reference to an address location; – Mnemonic is the name of the instruction; – Operand1 is the destination of the operation; – Operand2 is normally the source of the operation; – Comments are written after “ ; ”, which does not affect theprogram; – For example MOVSR3, #0x11;Set register R3 to 0x11 – Note that the assembly code can be assembled by eitherARM assembler (armasm) or assembly tools from a variety of vendors (e.g. GNU tool chain). When using GNU tool chain, the syntax for labels and comments is slightly different Cortex-M4 Instruction Set (cont.)
  • 24.
    Cortex-M4 Suffix – Someinstructions can be followed by suffixes to update processor flags or execute the instruction on a certain condition
  • 25.
    C Calling Assembly Forall applications the most common scenario involving assembly code writing, if needed at all, will be C calling assembly. In simple terms the rules are: Formally, the ARM Architecture Procedure Call Standard (AAPCS) defines: – Which registers must be saved and restored – How to call procedures – How to return from procedures
  • 26.
    AAPCS Register UseConventions • Make it easier to create modular, isolated and integrated code • Scratch registers are not expected to be preserved upon returning from a called subroutine – This applies to r0–r3 • Preserved (“variable”) registers are expected to have their original values upon returning from a called subroutine – This applies to r4–r8, r10–r11 Use PUSH {r4,..} and POP {r4,...}
  • 27.