Introduction to
Assembly Language
2nd Semester SY 2009-2010
Benjie A. Pabroa
What is Assembly
Language
"High"-level languages such as BASIC,
FORTRAN, Pascal, Lisp, APL, etc. are
designed to ease the strain of
programming by providing the user with a
set of somewhat sophisticated operations
that are easily accessed
Assembly as Low-level
language
The lesson we derive is this: a very low-level
language might be very flexible and
efficient (in terms of speed and memory
use), but might be very difficult to program
in since no sophisticated operations are
provided and since the programmer must
understand in detail the operation of the
computer
Assembly language is essentially the lowest
possible level of language.
Built-in Features
the ability to read the values stored at
various "memory locations",
the ability to write a new value into a
memory location,
the ability to do integer arithmetic of limited
precision (add, subtract, multiply, divide),
The ability to do logical operations (or, and,
not, xor),
and the ability to "jump" to programs stored
at various locations in the computer's
memory.
Features not included
The ability to perform graphics
and the ability to access files
ability to directly perform floating-point
arithmeti
Assembly vs High Level
Lang
FORTRAN code to average together the N numbers stored
in the array X(I):
INTEGER*2 I,X(N)
INTEGER*4 AVG
.
.
.
AVERAGE THE ARRAY X, STORING THE RESULT AS AVG:
AVG=0
DO 10 I=1,N
AVG=AVG+X(I)
AVG=AVG/N
.
.
.
Assembly vs High Level
Lang
mov cx,n ; cx is used as the loop
; counter. It starts at N and
; counts down to zero.
mov dx,0 ; the dx register stores the
; two most significant bytes of
; the running sum
mov ax,0 ; use ax to store the least
; significant bytes
mov si,offset x ; use the si register to point
; to the currently accessed
; element X(I), starting with
; I=0
Assembly vs High Level
Lang
addloop:
add ax,word ptr [si] ; add X(I) to the two least
; significant bytes of AVG
adc dx,0 ; add the "carry" into the two
; most significant bytes of AVG
add si,2 ; move si to point to X(I+1)
loop addloop ; decrement cx and loop again
; if not zero
div n ; divides AVG by N
mov avg,ax ; save the result as AVG
Assembly vs High Level
Lang
writing it required intimate knowledge of
how the variables x, n, and avg were
stored in memory.
PC System Architecture
Microprocessor
◦ Reading instructions from the memory and
executing them
Access memory
Do arithmetic and logical operations
Performs other services as well
PC System Architecture
1971:
◦ Intel’s 4004 was the first microprocessor—a 4-bit CPU (like the one
from CS231) that fit all on one chip.
1978:
◦ The 8086 was one of the earliest 16-bit processors.
1981:
◦ IBM uses the 8088 in their little PC project.
1989:
◦ The 80486 includes a floating-point unit in the same chip as the main
processor, and uses RISC-based implementation ideas like pipelining
for greatly increased performance.
1997:
◦ The Pentium II is superscalar, supports multiprocessing, and includes
special instructions for multimedia applications.
2002:
◦ The Pentium 4 runs at insane clock rates (3.06 GHz), implements
extended multimedia instructions and has a large on-chip cache.
PC System Architecture..
Memory
◦ Store instructions(program) or data
◦ It appears as a sequence of locations(or
addresses)
Each address – stored a byte
◦ Types:
ROM
Stored byte may only be read by the CPU
Cannot be changed
RAM
Stored byte may be both read and
written(changed)
Volatile – all data will be lost after shutdown
Both types are random access
The Process of Assembly
Assembly language is a compiled language
◦ Source-code must first be created with a text-
editor program
◦ Then the source-code will be compiled
◦ Assembly language compilers => assemblers
Auxiliary Programs
◦ First: text-editor(source code editor)
◦ Second: assembler
Assembles source code to generate object code
in the process.
◦ Third: Linker
Combines object code modules created by
assembler
The Process of Assembly..
◦ Fourth: Loader
Built-in to the operating system and is never
explicitly executed.
Takes the “relocatable” code created by the
linker, “loads: it into memory at the lowest
available location, then runs it.
◦ Fifth: Debugger
Environment for running and testing assembly
language programs.
The Process of Assembly..
Object Code Linker Relocatable Code Loader
RAM
Assem bler
Source Code Other Object Code1
Other Object Code2
DOS and Simple File
Operation
DOS
◦ provides the environment in which programs
run.
◦ Provides a set of helpful utility functions
Must be understood in order to create program
in DOS
Making an assembly Source
Code
You can use the edit command in DOS or
just use the notepad.
AH AL
BH BL CS
CH CL DS
DH DL SS
SP ES
BP
SI
DI
Bus Cont rol Unit
ALU
CU 1
Flag Register 2
3
4
Instruction Pointer
CPU Registers
Assembly language
◦ Thought goes into the use of the computer
memory and the CPU registers
Register
◦ Like a memory location in that it can store a
byte (or work) value.
◦ No address in the memory, it is not part of the
computer memory(built into the CPU)
CPU Registers
Importance of Registers in Assembly Prog.
◦ Instructions using registers > operating on
values stored at memory locations.
◦ Instructions tend to be shorter (less room to
store in memory)
◦ Register-oriented instructions operate faster that
memory-oriented instructions
Since the computer hardware can access a
register much faster than a memory location.
◦
CPU Registers (8086
family)
AX The Accumulator SP The stack pointer
BX The Pointer Register IP The Instruction pointer
CX The Loop Counter CS The “code segment”
DX Used for multiplication DS register
The “data segment”
SI and Division
The “Source” string SS register
The “stack segment”
DI index register
The “Destination” ES register
The “Extra segment”
BP String
Used forindex register
passing FLAG register
The flag register
arguments on the stack
Segment Registers
CS Code Segment 16-bit number that points to
the active code-segment
DS Data Segment 16-bit number that points to
the active data-segment
SS Stack Segment 16-bit number that points to
the active stack-segment
ES Extra Segment 16-bit number that points to
the active extra-segment
Pointer Registers
IP Instruction Pointer 16-bit number that
points to the offset of
the next instruction
SP Stack Pointer 16-bit number that
points to the offset
that the stack is using
BP Base Pointer used to pass data to
and from the stack
General Purpose Registers
AX Accumulator Register mostly used for
calculations and for
input/output
BX Base Register Only register that can
be used as an index
CX Count Register register used for the
loop instruction
DX Data Register input/output and used
by multiply and
divide
Index Registers
SI Source Index used by string
operations as
source
DI Destination Index used by string
operations as
destination
CPU registers
◦ AX, BX, CX, & DX – more flexible that other
Can be used as word registers(16-bit val)
Or as a pairs of byte registers (8-bit vals)
◦ A General purpose registers can be “split”
AX = AH + AL
BX = BH + BL
CX = CH + CL
DX = DH + DL
◦ Ex: DX = 1234h, then DH = 12h and DL = 34h
Flag Registers
Consist of 9 status bits(flags)
Flags – because it can be either
◦ SET(1)
◦ NOT SET(0)
Flag Registers
Abr. Name bit nº Description
OF Overflow Flag 11 indicates an overflow when
set
DF Direction Flag 10 used for string operations to
check direction
IF Interrupt Flag 9 if set, interrupt are enabled,
else disabled
TF Trap Flag 8 if set, CPU can work in
single step mode
SF Sign Flag 7 if set, resulting number of
calculation is negative
Flag Registers..
Abr. Name bit nº Description
ZF Zero Flag 6 if set, resulting number
of calculation is zero
AF Auxiliary Carry 4 some sort of second
carry flag
PF Parity Flag 2 indicates even or odd
parity
CF Carry Flag 0 contains the left-most bit
after calculations
Test it
You want to see all these register and flags?
◦ go to DOS
◦ Type debug
◦ type "r"
◦ The you’ll see all the registers and some
abbreviations for the flags.
◦ Type "q" to quit again.
Memory Segmentation
How DOS uses memory
◦ databus = 16-bit
it can move and store 16 bits(1 word = 2 bytes)
at a time.
◦ If the processor store 1 word (16-bits) it stores
the bytes in reverse order in the memory.
1234h (word) ---> memory 34h (byte) 12h
(byte)
Memory value: 78h 56h
derived value 5678h
Memory Segmentation..
Computer divides it memory into segments
◦ Standard in DOS
◦ Segments are 64KB big and have a number
◦ These numbers are stored in the segment
registers (see above).
◦ Three main segments are the code, data and
stack segment
Overlap each other almost completely
Try type d in the debug
4576:0100 -> memory address
where 4576 – segment number; 0100 – offset
Memory Segmentation..
Segments overlaps
◦ The address 0000:0010 = 0001:0000
◦ Therefore, segments starts at paragraph
boundaries
A paragraph = 16 bytes
So a segment starts at an address divisible by 16
◦ 0000:0010 => 0h:10h => 0:16
Memory Location: (0*16)+16 = 0+16 = 16 (linear
address)
◦ 0001:0000 => 1h:0h => 1:0
Memory Location: (1*16)+0 = 16+0 = 16 (linear
address)
.model small
.stack
.data
My First Program
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Names
Identifiers
◦ An identifier is a name you apply to items in
your program. the two types of identifiers are
"name", which refers to the address of a data
item, and "label", which refers to the address
of an instruction. The same rules apply to
names and labels
◦
Statements
◦ A program is made of a set of statements, there
are two types of statements, "instructions"
such as MOV and LEA, and "directives" which
tell the assembler to perform a specific action,
like ".model small“ or “.code”
Statements
Here's the general format of a statement:
indentifier - operation - operand(s) - comment
◦
◦ The identifier is the name as explained above.
◦ The operation is an instruction like MOV.
◦ The operands provide information for the
Operation to act on.
◦ Like
MOV (operation) AX,BX (operands).
◦ The comment is a line of text you can add as a
comment, everything the assembler sees after
a ";" is ignored.
Statements
Example
◦ MOV AX,BX ;this is a MOV instruction
How to Assemble
The source code can only be assembled by
an assembler or and the linker.
◦ A86
◦ MASM
◦ TASM – we will use this one
Install TASM
Then use the tasm.exe and tlink.exe
How to Assemble
• The Assemble
– To assemble Type the ff. on the
command prompt:
• cd c:\tasm\bin
• tasm <filename/path of the source code>
– tasm c:\first.asm
• tlink <filename/path of the object code>
– tlink c:\tasm\bin\first.obj or
– tlink first.obj
– To run call the .exe on the command
prompt:
• Example in our program(First.asm)
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code
.model small
◦ Lines that start with a "." are used to provide the assembler
with information.
◦ The word(s) behind it say what kind of info.
In this case it just tells the assembler that the program is small
and doesn't need a lot of memory. I'll get back on this later.
.stack
◦ This one tells the assembler that the "stack" segment starts
here.
The stack is used to store temporary data.
◦
.data
◦ indicates that the data segment starts here and that the stack
segment ends there.
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
.code
◦ indicates that the code segment starts there and the data
segment ends there.
◦
main proc
◦ Code must be in procedures, just like in C or any other language.
◦ This indicates a procedure called main starts here.
◦ endp states that the procedure is finished.
◦ endmain main : tells the assembler that the program is finished.
◦ It also tells the assembler where to start in the program.
At the procedure called main in this case.
◦
message db "xxxx"
◦ DB means Define Byte and so it does.
◦ In the data-segment it defines a couple of bytes.
◦ These bytes contain the information between the brackets.
◦ "Message" is a name to indentify this byte-string.
◦ It's called an "indentifier".
Memory space for variables
◦ DB (Byte – 8 bit )
◦ DW (Word – 16 bit)
◦ DD (Doubleword – 32 bit)
◦ Example:
foo db 27 ;by default all numbers are decimal
bar dw 3e1h ; appending an "h" means hexadecimal
real_fat_rat dd ? ; "?" means "don't care about the value“
◦ Variable name
Address can’t be changed
Value can be changed
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax, seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
mov ax, seg message
◦ AX is a register.
You use registers all the time, so that's why you had to know
about them before.
◦ MOV is an instruction that moves data.
It can have a few "operands“
Here the operands are AX and seg message.
◦ seg message can be seen as a number.
It's the number of the segment "message“ in (The data-segment)
We have to know this number, so we can load the DS register
with it.
Else we can't get to the bit-string in memory.
We need to know WHERE the bit-string is located in memory.
◦ The number is loaded in the AX register.
MOV always moves data to the operand left of the comma and
from the operand right of the comma.
The MOV Instruction
Syntax:
◦ MOV destination, source
Allows you to move data into and out the
registers
◦ Destination
either registers or mem. Loc.
◦ Source
can be either registers, mem. Loc. or numeric
value
Memory-to-memory transfer NOT ALLOWED
The MOV Instruction
foo db 27 ;by default all numbers are decimal
Codes we do earlier bar dw 3e1h ; appending an "h" means hexadecimal
real_fat_rat dd ? ; "?" means "don't care about the value“
otice the size of the source and destination
mov ax,bar ; load the word-size register ax with
(must match in ; the word value stored at location bar.
reg-reg, mov dl,foo ; load the byte-size register dl with
mem-reg, ; the byte value stored at location foo.
reg-mem mov bx,ax ; load the word-size register bx with
Transfers) ; the byte value in ax.
mov bl,ch ; load the byte-size register bl with
; the byte value in ch.
mov bar,si ; store the value in the word-size
; register si at the memory location
; labelled "bar".
mov foo,dh ; store the byte value in the register
; dh at memory location foo.
mov ax,5
onstant must consistent with the destination ; store the word 5 in the ax register.
mov al,5 ; store the byte 5 in the al register.
mov bar,5 ; store the word 5 at location bar.
mov foo,5 ; store the byte 5 at location foo.
Illegal Move Statement
◦ MOV AL, 3172
◦ MOV foo, 3172
Why the code above are Illegal?
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax, seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
◦
mov ds,ax
◦ Here it moves the number in the AX register (the number of
the data segment) into the DS register.
◦ We have to load this DS register this way (with two
instructions)
◦ Just typing: "mov ds,segment message" isn't possible.
mov ah, 09
◦ MOV again. This time it load the AH register with the constant
value nine.
lea dx, message
◦ LEA - Load Effective Address.
This instructions stores the offset within the datasegment of the
bit-string message into the DX register.
This offset is the second thing we need to know, when we want to
know where "message" is in the memory.
So now we have DS:DX.
.model small
.stack
.data
message db "Hello world, I'm learning Assembly !!!", "$"
.code
main proc
mov ax,seg message
mov ds,ax
mov ah,09
lea dx,message
int 21h
mov ax,4c00h
int 21h
main endp
end main
Dissecting Code..
int 21h
◦ This instruction causes an Interrupt.
◦ The processor calls a routine somewhere in memory.
◦ 21h tells the processor what kind of routine, in this case a DOS
routine.
◦ For now assume that INT just calls a procedure from DOS.
◦ The procedure looks at the AH register to find out what it has to do.
◦ In this example the value 9 in the AH register indicates that the
procedure should write a bit-string to the screen.
mov ax, 4c00h
◦ Load the Ax register with the constant value 4c00h
int 21h
◦ this time the AH register contains the value 4ch (AX=4c00h) and to
the DOS procedure that means "exit program".
◦ The value of AL is used as an "exit-code" 00h means "No error"
After running:
◦ Go to DOS and type “FIRST.exe” to debug.
◦ Type d -> display some addresses
◦ Type u -> you will see something
0F77:0000 B8790F MOV AX,0F79
0F77:0003 8ED8 MOV DS,AX
0F77:0005 B409 MOV AH,09
Segm ent Num ber & Offset
Machine Code inst ruct ion
0F77:0000 B8790F MOV AX,0F79
0F77:0003 8ED8 MOV DS,AX
0F77:0005 B409 MOV AH,09
0F77:0000 B8790F MOV AX,0F79
originally: mov ax, seg message
B8 ->mov ax
790F ->number
It means that data is store in the segment with number 0F79
The other instruction lea dx,message
turned into mov dx,0.
◦ So that means that the offset of the bit-string is
0 --> 0F79:0000.
◦ Try to type d 0F79:0000
◦
◦ Calculating other address
We will subtract 2 segments from 0F79 = 0F77
2 segments = 32 bit (0002:0000)
The other address is 0F77:0020
◦
The Stack
The stack is a place where data is
temporarily stored
The SS and SP registers point to that place
like this: SS:SP
◦ So the SS register is the segment and the SP
register contains the offset
There are a few instructions that make use
of the stack
◦ PUSH - Push a value on the stack
◦ POP - retrieve that value from the stack
The Stack
MOV AX,1234H
PUSH AX
MOV AH,09
INT 21H
POP AX
◦ The final value of AX will be 1234h.
First we load 1234h into AX,
then we push that value to the stack.
We now store 9 in AH, so AX will be 0934h
and execute an INT.
Then we pop the AX register.
We retrieve the pushed value from the stack.
So AX contains 1234h again
The Stack
MOV AX, 1234H
MOV BX, 5678H
PUSH AX
POP BX
◦ We pushed the AX to the stack
◦ and we popped that value in BX.
◦
◦ What is the final value of AX and BX?
The Stack
It is easy done by the instruction .stack that
will create a stack of 1024 bytes.
The stack uses a LIFO system (Last In First
Out)
The Stack
MOV AX,1234H
MOV BX,5678H
PUSH AX
PUSH BX
POP AX
POP BX
First the value 1234h was pushed after that the
value 5678h was pushed to the stack.
According to LIFO 5678h comes of first, so AX will
pop that value and BX will pop the next.
What is the value of AX and BX?
How does the stack look in
memory?
it "grows" downwards in memory.
When you push a word (2 bytes) for
example, the word will be stored at SS:SP
and SP will be decreased to times.
So in the beginning SP points to the top of
the stack and (if you don't pay attention) it
can grow so big downwards in memory
that it overwrites the source code.
Major system crash is the result.
Congatulation!!
If you fully understand this stuff (registers,
flags, segments, stack, names, etc.) you
may, from now on, call yourself a
"Level 0 Assembly Coder"