Computer
Organization &
BITS
Architecture DS
Rao
Pilani 98663582
Pilani Campus 71
Computer Architecture and
Organization-1
• Architecture is those attributes visible to the programmer and having direct
impact on the logical execution of the program
– Instruction set, Number of bits used for data types, I/O mechanisms,
Memory addressing techniques
– e.g. x86 architecture, IBM/360 architecture
• Organization is implementation of computer system in terms of its
interconnection of functional units
– Control signals
– Interfaces between computer and peripherals
– Memory technology
2
BITS Pilani, Pilani
Computer
Architecture
and Organization-2
• Architecture Question?
– Is there a multiply/division instruction available?
• Organization Question?
– Is multiplication implemented by separate
hardware or is it done by repeated addition?
3
BITS Pilani, Pilani
Computer Organization vs. Computer
Architecture
• For example, it is an architectural design issue whether a computer will have a multiply
instruction.
• It is an organizational issue whether that instruction will be implemented by a special
multiply unit or by a mechanism that makes repeated use of the add unit of the system
• Many computer manufacturers offer a family of computer models, all with the same
architecture but with differences in organization.
• Different models in the family have different price and performance characteristics.
• A prominent example of both these phenomena is the IBM system /370 architecture. It is
first introduced in 1970 and included a number of models.
• This gives code compatibility
— At least backwards
• Organization differs between different versions
BITS Pilani, Deemed to be University under Section 3 of UGC
Structure and Functions of various components of
a computer
system.
• Structure is the way in which components relate to each other.
Structure = static relations among components
• Function is the operation of individual components as part of the structure.
Function = dynamic behaviour of each component
• In terms of description, we have two choices, starting at the bottom and
building up to a complete description or beginning with a top view and
decomposing the system into the subparts.
• The computer system will be described from top down.
BITS Pilani, Deemed to be University under Section 3 of UGC
How to Describe
Computer System?
• Computer is a complex system!!!
– Contains millions of electronics components
• Function is the operation of individual
components as part of the structure
• Structure is the way in which components are
related to each other
6
BITS Pilani, Pilani
Functional View of
Computer
• All computer functions are:
— Data processing
— Data storage
— Data movement
— Control
7
BITS Pilani, Pilani
Computer
Operations
Operations (b) Storage Operation (c) Processing from/to storage
Operations (a) Data movement
Operation (d)
Processing from storage to I/O
8
BITS Pilani, Pilani
Structu
re
• The figure shows the simplest depiction of a computer.
• The computer interacts in some fashion with its external environment.
• In general, all of its linkages to the external environment can be classified as peripheral
devices or communication lines.
Peripherals
Communication lines
COMPUTER
•Storage
•processing
The Computer
BITS Pilani, Deemed to be University under Section 3 of UGC
Structural View of a
Computer
Peripherals Comput
er
Central Main
Processin Memor
g Unit y
Computer
Systems
Interconnecti
on
Input
Outp
Communication ut
lines
10
BITS Pilani, Pilani
• There are four main structural components:
• Central processing unit (CPU): controls the operation of the
computer and performs its data processing functions; often simply
referred to as processor.
• Main memory: Stores data.
• I/O: Moves data between the computer and its external
environment.
• System Interconnection: Some mechanism that provides for
communication among CPU, main memory and I/O.
BITS Pilani, Deemed to be University under Section 3 of UGC
Structural View of
the CPU
CPU
Comput Arithmet
er Register ic and
I/O s Logic
System CPU Unit
Bus
Internal CPU
Memor Interconnecti
y on
Contr
ol
Unit
12
BITS Pilani, Pilani
Structural View of
the CPU
•A common example of system interconnection is by means of a system bus, consisting
of number of conducting wires to which all the other components attach.
•The most complex component is the CPU and its major structural components are as
follows:
•Control unit: Controls the operation of the CPU and hence the computer.
•Arithmetic and logic unit (ALU): Performs the computers data processing functions.
•Registers: Provides storage internal to the CPU.
•CPU Interconnection: Some mechanism that provides for communication among the
control unit, ALU and registers.
BITS Pilani, Pilani
Brief History of
Computers
• ENIAC was the first general purpose electronic digital
computer
– By John Mauchly and John Eckert in 1946
• Von Neumann Machine (1946) called as IAS( Institute
of advanced study)
– Based on stored program concept
• PDP-1 Computer (1957)
– Developed by Digital Equipment Corporation (DEC)
– First step towards mini computers
• IBM SYSTEM/360 (1964)
– First planned family of computers
14
BITS Pilani, Pilani
ENIAC -
background
• Electronic Numerical Integrator And Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
– Too late for war effort
• Used until 1955
15
BITS Pilani, Pilani
ENIAC -
details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
16
BITS Pilani, Pilani
von
Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from
memory and executing
• Input and output equipment operated by control
unit
• Princeton Institute for Advanced Studies
– IAS ( Institute for Advanced Study)
• Completed 1952
17
BITS Pilani, Pilani
Structure of von Neumann
machine
18
BITS Pilani, Pilani
IAS -
details
• 1000 x 40 bit words
– Binary number
– memory of the IAS consists of 4,096 storage locations, called words, of 40
binary digits (bits) each
– 2 x 20 bit instructions
• Set of registers (storage in CPU)
– Memory Buffer Register
– Memory Address Register
– Instruction Register
– Instruction Buffer Register
– Program Counter
– Accumulator
19
– Multiplier Quotient
BITS Pilani, Pilani
IAS -
details
20
BITS Pilani, Pilani
Structure of
IAS – detail
21
BITS Pilani, Pilani
Commercial
Computers
• 1947 - Eckert-Mauchly Computer Corporation
• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census 1950 calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
– Faster
– More memory
IBM
• Punched-card processing equipment
• 1953 - the 701
– IBM’s first stored program computer
– Scientific calculations
• 1955 - the 702
– Business applications
• Lead to 700/7000 series
22
BITS Pilani, Pilani
Transisto
rs
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al
23
BITS Pilani, Pilani
Transistor Based
Computers
• Second generation machines
• NCR & RCA produced small transistor
machines
• IBM 7000
• DEC - 1957
– Produced PDP-1(Programmed Data
Processor)
24
BITS Pilani, Pilani
Microelectron
ics
• Literally - “small electronics”
• A computer is made up of gates, memory cells
and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer
25
BITS Pilani, Pilani
DEC
PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
– $100k+ for IBM 360
• Embedded applications & OEM(original equipment
manufacturers)
• BUS STRUCTURE
BITS Pilani, Pilani
DEC - PDP-8 Bus
Structure
models of the PDP-8 used a structure that became virtually universal for
microcomputers: the bus structure
BITS Pilani, Pilani
IBM 360
series
• 1964
• Replaced (& not compatible with) 7000 series
• First planned “family” of computers
– Similar or identical instruction sets
– Similar or identical O/S
– Increasing speed
– Increasing number of I/O ports (i.e. more terminals)
– Increased memory size
– Increased cost
• Multiplexed switch structure
BITS Pilani, Pilani
Semiconductor
Memory
• 1970
• Fairchild
• Size of a single core
– i.e. 1 bit of magnetic core storage
• Holds 256 bits
• Non-destructive read
• Much faster than core
• Capacity approximately doubles each year
BITS Pilani, Pilani
Int
el
• 1971 - 4004
– First microprocessor
– All CPU components on a single chip
– 4 bit
• Followed in 1972 by 8008
– 8 bit
– Both designed for specific applications
• 1974 - 8080
– Intel’s first general purpose microprocessor
BITS Pilani, Pilani
Computer
Generations
• Vacuum tube (1946-1957) & Transistor (1958-
1964)
• Integrated Circuits
• Small scale integration - 1965 on
– Up to 100 devices on a chip
• Medium scale integration - to 1971
– 100 - 3,000 devices on a chip
• Large scale integration - 1971-1977
– 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
– 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 onwards
– Over 100,000,000 devices on a chip
31
BITS Pilani, Pilani
x86
Evolution-1
• 1971 - 4004
– First microprocessor of 4 bit
– All CPU components on a single chip
– Followed in 1972 by 8008 (8 bit processor)
– Both designed for specific applications
• 8080
– First general purpose microprocessor
– Process/move 8 bit data at a time
– Used in first personal computer – Altair
32
BITS Pilani, Pilani
x86
Evolution-2
• 8086
– Much more powerful (16 bit data)
– Instruction cache, pre-fetch few instructions
– 8088 (8 bit external bus) used in first IBM PC
• 80286
– 16 MByte memory addressable
– Up from 1MB (in 8086)
• 80386
– 32 bit processor with multitasking support
33
BITS Pilani, Pilani
x86
Evolution-3
• 80486
– Sophisticated powerful cache and instruction pipelining
– Built in maths co-processor
• Pentium
– Superscalar
– Multiple instructions executed in parallel
• Pentium Pro
– Increased superscalar organization
– Aggressive register renaming
– Branch prediction and Data flow analysis
34
BITS Pilani, Pilani
x86
Evolution-4
• Pentium II
– MMX technology, graphics, video & audio processing
• Pentium III
– Additional floating point instructions for 3D graphics
• Pentium 4
– Further floating point and multimedia enhancements
• Itanium Series
– 64 bit with Hardware enhancements to increase speed
• What’s next???
– Multi core architectures
35
BITS Pilani, Pilani
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Fetch and Execute cycles
BITS Pilani, Pilani
What is a
program?
• A logical group of instructions to achieve
a specific task.
• An instruction is a sequence of steps
• For each step an arithmetic or
logical operation or
data movement is done.
BITS Pilani, Pilani
Addition of Two
Numbers
Start
Load R1,A
Load R2,B
R3=R1+R2
Store
C,R3
End of
program
38
I P ila n i,
Components (start
here)
• The Control Unit and the Arithmetic and Logic
Unit constitute the Central Processing Unit
• Data and instructions need to get into the
system and results out
– Input/output
• Temporary storage of code and results is
needed
– Main memory
BITS Pilani, Pilani
Computer
Components: Top
Level View
BITS Pilani, Pilani
Instruction
Cycle
• Two steps:
– Fetch
– Execute
BITS Pilani, Pilani
Fetch
Cycle
• Program Counter (PC) holds address of next
instruction to be fetched
• Processor fetches instruction from memory
location pointed to by PC
• Increment PC
– Unless told otherwise
• Instruction loaded into Instruction Register
(IR)
• Processor interprets instruction and performs
required actions
BITS Pilani, Pilani
Execute
Cycle
• Processor-memory
– data transfer between CPU and main memory
• Processor I/O
– Data transfer between CPU and I/O module
• Data processing
– Some arithmetic or logical operation on data
• Control
– Alteration of sequence of operations
– e.g. jump
• Combination of above
BITS Pilani, Pilani
0 34 15
Opcode Address
Instruction format
• The instruction format provides 4 bits for the opcode, so that there can be as many as 24 = 16
different opcodes.
• ,and up to 212 = 4096 (4K) words of memory can be directly addressed.
opcode function
0001 Load AC from memory
0010 Store AC to Memory
0101 Add to AC from
Memory
BITS Pilani, Pilani
Example of Program
Execution
The program fragment shown
adds the contents of the
memory word at address 940
to the contents of the
memory word at address 941
and stores the result in the
latter location
BITS Pilani, Pilani
The Role of
Performance
46
BITS Pilani, Deemed to be University under Section 3 of UGC
47
BITS Pilani, Deemed to be University under Section 3 of UGC
CPU Performance
• Equation
Multiple aspects to performance: helps to isolate them
• Latency = seconds / program =
– (insns / program) * (cycles / insn) * (seconds / cycle)
– Insns / program: dynamic insn count
• Impacted by program, compiler, ISA
– Cycles / insn: CPI
• Impacted by program, compiler, ISA, micro-arch
– Seconds / cycle: clock period (Hz)
• Impacted by micro-arch, technology
• For low latency (better performance) minimize all three
– Difficult: often pull against one another
48
Example we have seen: RISC vs.er Section 3 of UGC Act, 1956
–
PERFORMANCE
Performance is one of the key parameter to
evaluate a system.
If we say one system is better than another
we consider
• Response Time
• Throughput
BITS Pilani, Deemed to be University under Section 3 of UGC
Computer
Performance
• Performance is one of the key parameter to-
– Evaluate processor hardware
– Measure requirement for new systems
• When we say one computer has better
performance than another, what does it mean?
– Criterion for the performance?
• Response Time (Single computer), Throughput
(Data center)
50
BITS Pilani, Pilani
Clock
Rate
• Operation performed by processor are governed by
system clock (fundamental level of processor speed
measurement)
– Generated by quartz crystal
one clock period
• A clock cycle is the basic unit of time to execute one
operation.
• Clock Rate (clock cycles per second in MHz or GHz) is
inverse of clock cycle time (clock period) 51
BITS Pilani, Pilani
Is Clock Rate
Enough?
• As we know, instruction execution takes several discrete steps
– Fetching, decoding, ALU operation, fetching data etc.
– It takes multiple clock cycles for its execution
• Different instructions takes different number of cycles for their
execution
– LOAD, ADD, SUB, JUMP etc.
• Thus clock speed doesn’t tell the whole story!!!
52
BITS Pilani, Pilani
Performance: Application
Specific
• Performance
– How a processor performs when executing a given application
– Application performance depends upon
• Speed of the processor
• Instruction set
• Choice of language
• Efficiency of the compiler
• Programming skill of the user
53
BITS Pilani, Pilani
CPU
Performance
• To maximize performance, need to minimize
execution time
performance = 1 / execution_time
If X is n times faster than Y, then
performanceX execution_timeY
= =n
performanceY execution_timeX
54
BITS Pilani, Pilani
Cycles Per Instruction
(CPI)
• For any given processor, number of cycles
required varies for different types of
instructions
– e.g. load, store, branch, add, mul etc.
• Hence CPI is not a constant value for a
processor
• Needs to calculate average CPI for
processor
56
BITS Pilani, Pilani
CPU Performance and it’s
Factors
• Program execution time (T) = Ic x CPI x 𝑟
– 𝑟 =cycle time, Ic = no. of instructions in the program
• Instruction execution also requires memory access
– T = Ic x [p +(m x k] x 𝑟
– p=processor cycles, m = memory references
– k = ratio of memory cycles to processor cycles
• These performance factors are influenced by
– ISA,
– compiler,(how effective the compiler is in producing an efficient machine language program
from a high-level language program)
– processor implementation
– and memory hierarchy 57
BITS Pilani, Pilani
MIPS
Rate
• Common measure of performance
– Millions Instructions Per Second (MIPS) rate
– Ic/(Execution time x 106)
– This can be written as:
=Ic/ cycle time x (Ic x CPI )x 106
= f/ CPI x 106
57
BITS Pilani, Pilani
In class
Example1
• Consider the execution of a program that results in the execution of 2 million
instructions on a 400-MHz processor. The program consists of four major types of
instructions. The instruction mix and the CPI for each instruction type are given
below, based on the result of a program trace experiment
The average CPI when the program is executed on a uniprocessor with the above trace results is
CPI = 0.6 + (2 * 0.18) + (4 * 0.12) + (8 * 0.1) = 2.24.
MIPS rate = Ic /T * 106 = f /CPI * 106
The corresponding MIPS rate is (400 * 106 )/(2.24 * 106 ) ≈ 178.
58
BITS Pilani, Pilani
Example of Performance
Measure
59
BITS Pilani, Pilani
Limitation of MIPS
Rate
• MIPS rate or instruction execution rate is also
inadequate to measure CPU performance. Why?
– Because of differences in ISA
– Ex. To execute a high level language statement
A=B+C (A,B and C are in memory) may need
different number of low level instructions for
different ISA
60
BITS Pilani, Pilani
Amdahl’s Law
Formula
• For program running on single processor
— Fraction f of code infinitely parallelizable with no scheduling
overhead
— Fraction (1-f) of code inherently serial
— T is total execution time for program on single processor
— N is number of processors that fully exploit parralle portions of code
• Conclusions
– f small, parallel processors has little effect
– N ->∞, speedup bound by 1/(1 – f)
• Diminishing returns for using more
61
processors BITS Pilani, Pilani
Amdahl’s Law
1 How much will an optimization
improve performance?
P
(1 P) S P = proportion of running time
affected by optimization
S = speedup
What if I speedup 25% of a program’s
execution by 2x? 1.14x speedup
What if I speedup 25% of a program’s
execution by ∞? 1.33x speedup
BITS Pilani, Pilani Ca
In class
Example2
Suppose we have made the following measurements:
Frequency of FP operations = 50%
Average CPI of FP operations = 4.0 Average CPI
of other instructions = 1.33 Frequency
of FPSQR= 20%
CPI of FPSQR = 20
Assume that the two design alternatives are to
decrease the CPI of FPSQR to 2 or to
decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives
using the processor performance equation.
63
BITS Pilani, Pilani
Apply Amdahl's Law to the above question and observe the result
Amdahl’s Law eqn:
With FPSQR Speedupoverall
=
With FP Speedupoverall=
We get the same result using Amdahl's Law
BITS Pilani, Pilani
In class
Example3
A benchmark program is run on a 40 MHz processor.The executed program consists of
100,000 instruction executions, with the following instruction mix and clock cycle count:
Instruction Type Instruction Count Cycles per Instruction
arithmetic 45000 1
Data transfer 32000 2
Floating point 15000 2
Control transfer 8000 2
Determine the effective CPI, MIPS rate, and execution time for this program.
BITS Pilani, Deemed to be University under Section 3 of UGC
In class Example3
contnd
SOL: CPI =1.55,MIPS RATE = 25.8,
execution time =38.75ms 67
BITS Pilani, Deemed to be University under Section 3 of UGC
In class
Example4
Q2. Early examples of CISC and RISC are the VAX 11/780 and the IBM RS/6000,
respectively. Using a typical benchmark program, the following machine
characteristics results:
The final column shows that the VAX required 12 times longer than the IBM
measured in CPU
time.
•What is the relative size of the instruction count of the machine code for this
benchmark program running on the two machines?
Processor Clock Frequency Performance CPU time
• What are the CPI (Cycles Per Instruction) values for the two machines?
VAX 11/780 5 MHz 1MIPS 12 x seconds
IBM RS/6000 25 MHz 18 MIPS x seconds
68
BITS Pilani, Deemed to be University under Section 3 of UGC
Sol: We can express the MIPS rate as: [(MIPS rate)/106] = Ic/T.
So that:
Ic = T [(MIPS rate)/106].
a) The ratio of the instruction count of the RS/6000 to the VAX is
[x 18]/[12x 1] = 1.5
b)For the Vax, CPI = (5 MHz)/(1 MIPS) = 5
For the RS/6000, CPI = 25/18 = 1.39
69
BITS Pilani, Deemed to be University under Section 3 of UGC
Benchmark
Programs
• It is a collection of a programs that provides representative
test of a computer in a particular application area
– e.g. SPEC (System Performance Evaluation Corporation)
benchmark suites
– SPEC CPU 2006 is used for measuring performance for the
computational based applications
69
BITS Pilani, Pilani
Review
Questions
• Differentiate between computer organization and architecture?
• What are the four main functions of a computer?
• What are the basic structural components of a computer?
• Describe the computer generations in brief.
• What was the first general purpose microprocessor?
• Consider two implementation of the same ISA. Computer A clock cycle time of 250
ns and a CPI of 2 for a program. Computer B has a clock cycle time of 500 ns and a
CPI of 1.2 for the same program. Which computer is faster for this program and by
how much?
• A program runs on computer A with a 2 GHz clock in 10 seconds. Another
computer B with 4 GHz run this program in 6 seconds. To accomplish this,
computer B will require P times as many clock cycles as computer A to run the
program. Find the value of P.
70
BITS Pilani, Pilani
Performance revision
Consider two different machines, with two different instruction
sets, both of which have a clock rate of 200 MHz. The following
measurements are recorded on the two machines running a
given set of
Instruction benchmark
Type programs:
Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
Machine B
Arithmetic and 10 1
logic
Load and store 8 2
Branch 2 4
Others
a. Determine the effective 4CPI, MIPS rate, and execution time
3 for each
machine.
b. Comment on the results.
BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.
BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.
Time for 1 cycle = 1/f Hz
Time for 1 instruction= CPI X 1/f
CPI X 1/f secs for 1 instruction
There fore in 1 sec no of instructions= f/ CPI
And this MIPS= f/ (CPI X 106)
BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.
And this MIPS= f/ (CPI X 106)
BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine B
Arithmetic and 10 1
logic
Load and store 8 2
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani|Dubai|Goa|Hyderabad
Contact Session 3
Computer System Components and
Interconnections
Inter
Connections
• All the units must be connected
• Different type of connection for different type
of unit
– Memory
– Input/Output
– CPU
BITS Pilani, Deemed to be University under Section 3 of UGC
Computer
Modules
■I/O module: I/O is functionally similar to memory.
There are two operations; read and write.
I/O module may control more than one external
device.
Refer to each of the interfaces to an external
device as a port and give each a unique address
(e.g., 0, 1, … , M-1).
In addition, there are external data paths for
the input and output of data with an external
device.
Finally, an I/O module may be able to send
interrupt signals to the processor.
BITS Pilani, Deemed to be University under Section 3 of UGC
CPU & MEMORY
Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
•Receives (& acts on) interrupts
Memory Connection
• Receives and sends data
• Receives addresses (of
locations)
• Receives control signals
– Read
– Write
– Timing
BITS Pilani, Deemed to be University under Section 3 of UGC
Input/Output
Connection
• Similar to memory from computer’s viewpoint
• Output
– Receive data from computer
– Send data to peripheral
• Input
– Receive data from peripheral
– Send data to computer
• Receive control signals from computer
• Send control signals to peripherals
– e.g. spin disk
• Receive addresses from computer
– e.g. port number to identify peripheral
• Send interrupt signals (control)
BITS Pilani, Deemed to be University under Section 3 of UGC
Input/Output
Connection
• Similar to memory from computer’s viewpoint
• Output
– Receive data from computer
– Send data to peripheral
• Input
– Receive data from peripheral
– Send data to computer
• Receive control signals from computer
• Send control signals to peripherals
– e.g. spin disk
• Receive addresses from computer
– e.g. port number to identify
• peripheral Send interrupt signals (control)
BITS Pilani, Deemed to be University under Section 3 of UGC
Buses
• There are a number of possible interconnection systems
• it has gradually given way to various point-to-point interconnection
structures, which now dominate computer system design
• Single and multiple BUS structures are most common
• e.g. Control/Address/Data bus (PC)
• e.g. Unibus (DEC-PDP)
A communication pathway connecting two or more devices is a bus.
• Parallel lines on circuit boards
• Ribbon cables
• Strip connectors on motherboards
• Sets of wires
BITS Pilani, Deemed to be University under Section 3 of UGC
Different
Bus
Data Bus
• Carries data
– Remember that there is no difference between “data” and “instruction”
at this level
• Width is a key determinant of performance
– 8, 16, 32, 64 bit
Address bus
Identify the source or destination of data
e.g. CPU needs to read an instruction (data) from a given
location in memory
Bus width determines maximum memory capacity of
system
e.g. 8080 has 16 bit address bus giving 64k address
BITS Pilani, Deemed to be University under Section 3 of UGC
space
Control
Bus
• Control and timing information
– Memory read/write signal
– Interrupt request
– Clock signals
BITS Pilani, Deemed to be University under Section 3 of UGC
Single Bus
Problems
• Lots of devices on one bus leads to:
– Propagation delays
• Long data paths mean that co-ordination of bus use can
adversely affect performance
• If aggregate data transfer approaches bus capacity
• Most systems use multiple buses to overcome
these problems
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple Bus
Hierarchies
A great number of devices on a bus will cause performance to suffer
Propagation delay - the time it takes for devices to coordinate the use of the bus
The bus may become a bottleneck as the aggregate data transfer demand
approaches the capacity of the bus (in available transfer cycles/second)
Traditional Hierarchical Bus Architecture
Use of a cache structure insulates CPU from frequent accesses to main memory
Main memory can be moved off local bus to a system bus
Expansion bus interface
o buffers data transfers between system bus and I/O controllers on expansion
bus
o insulates memory-to-processor traffic from I/O traffic
BITS Pilani, Deemed to be University under Section 3 of UGC
Traditional
(ISA) (with
cache) Main memory can be
moved off of the local
bus onto a system bus.
make use of one or
more expansion
buses
small computer system
interface
LANs
WAN
s BITS Pilani, Deemed to be University under Section 3 of UGC
High Performance
Bus High-performance Hierarchical Bus
Architecture
Traditional hierarchical bus breaks
down as higher and higher
performance is seen in the I/O devices
to high-speed LANs, such as Fast Ethernet at Incorporates a high-speed bus
100 Mbps
o specifically designed to
support high- capacity I/O
devices
o brings high-demand
devices into closer integration
with the processor and at the
same time is independent of the
processor
o Changes in processor
architecture do not affect the high-
BITS Pilani, Deemed to be University under Section 3 of UGC
speed bus, and vice versa
Bus
Types
• Dedicated
– Separate data & address lines
• Multiplexed
– Shared lines
– Address valid or data valid control line
– Advantage - fewer lines
– Disadvantages
• More complex control
• Reduction in performance (cannot take place in parallel)
BITS Pilani, Deemed to be University under Section 3 of UGC
Bus
Arbitration
• More than one module controlling the bus
• e.g. CPU and DMA controller
• Only one module may control bus at one time
• Arbitration may be centralised or distributed
• Centralised
– Single hardware device controlling bus access
• Bus Controller
– May be part of CPU or separate
• Distributed
– Each module may claim the bus
– Control logic on all modules
BITS Pilani, Deemed to be University under Section 3 of UGC
What is Bus Arbitration, Bus Mastering and DMA?
• Bus Arbitration – an elaborate system for resolving bus
control conflicts and assigning priorities to the requests for
control of the bus.
• Bus Mastering – a method of enabling different device
controllers on the bus to ‘talk’ to one an other,
without having to go through the CPU.
• DMA(Direct Memory Access) – a method of transferring data
from a hard disk to main memory without having to go
through the CPU.
91
BITS Pilani, Deemed to be University under Section 3 of UGC
Bus Arbitration Methods
• Centralized
Centralized bus arbitration requires hardware (arbiter)that
will grant the bus to one of the requesting devices. This
hardware can be part of the CPU or it can be a separate
device on the motherboard.
• Decentralized
Decentralized arbitration there isn't an arbiter, so the devices
have to decide who goes next. This makes the devices more
complicated, but saves the expense of having an arbiter.
92
BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized
arbitration BBS
Y
BR
Processor
DMA DMA
controller controller
BG 1 BG 2
1 2
•Bus arbiter may be the processor or a separate unit connected to
the bus.
•Normally, the processor is the bus master, unless it grants bus
membership
to one of the DMA controllers.
•DMA controller requests the control of the bus by asserting the
Bus Request (BR) line.
•In response, the processor activates the Bus-Grant1 (BG1) line,
indicating
that the controller may use the bus when it is free.
•BG1 signal is connected to all DMA controllers in a daisy 93 chain
fashion. BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized arbitration
(contd..)
DMA controller 2
asserts the BR Tim
signal. Processor
e
BR
asserts
the BG1 signal
BG1 BG1 signal
propagates to
BG2 DMA#2.
B BS
Y
Bus
mast
er P rocess DMA controller P rocess
or 2 or
Processor relinquishes control
of the bus by setting BBSY to
1.
94
BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized arbitration
(contd..)
• Centralized arbitration scheme with one Bus-Request (BR) line
and one Bus-Grant (BG) line forming a daisy chain.
• Several pairs of BR and BG lines are possible, perhaps one per
device as in the case of interrupts.
• Bus arbiter has to ensure that only one request is granted at
any given time.
• It may do so according to a fixed priority scheme, or a
rotating
priority scheme.
• Rotating priority scheme:
There are four devices, and initial priority is 1,2,3,4.
After the request from device 1 is granted, the priority
changes to 2,3,4,1. 95
BITS Pilani, Deemed to be University under Section 3 of UGC
Distributed
arbitration
• All devices waiting to use the bus share the responsibility
of carrying out the arbitration process.
Arbitration process does not depend on a central arbiter and hence distributed
arbitration has higher reliability.
• Each device is assigned a 4-bit ID number.
• All the devices are connected using 5 lines, 4 arbitration
lines to transmit the ID, and one line for the Start-
Arbitration signal.
• To request the bus a device:
Asserts the Start-Arbitration signal.
Places its 4-bit ID number on the arbitration lines.
• The pattern that appears on the arbitration lines is the
logical-OR of all the 4-bit device IDs placed on the
arbitration lines. BITS Pilani, Deemed to be University under Section 3 of UGC
25
Distributed
•Device Aarbitration
has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
•Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
•Pattern that appears on the arbitration lines is the logical OR of the
patterns: -
Pattern 0111 appears on the arbitration lines.
Arbitration process:
•Each device compares the pattern that appears on the arbitration
lines to its own ID, starting with MSB.
•If it detects a difference, it transmits 0s on the arbitration lines for
that and all
lower bit positions.
•Device A compares its ID 5 with a pattern 0101 to pattern 0111.
•It detects a difference at bit position 2, as a result, it transmits a
pattern 0100 on the arbitration lines.
•The pattern that appears on the arbitration lines is the logical-OR 26of
0100 and 0110, which is 0110. BITS Pilani, Deemed to be University under Section 3 of
Timin
g
• Co-ordination of events on bus
• Synchronous
– Events determined by clock signals and synchronized
on leading edge of clock
– All devices can read clock line
– Usually a single cycle for an event
Asynchronous
⮚ The occurrence of one event on a bus
depends on the occurrence of a previous event
⮚ Events on the bus are not synchronized with
the clock
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
bus
Bus clock
Bus cycle
28
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.) T im
e
Bus
clock
Address
and
comman
d
Dat
a
t0 t1 t2
Bus
Master places the cycle
device address Addressed slave
and command on places Master “strobes” the
the bus, and data on the data data on the data
indicates that lines lines into its input
it is a Read buffer, for a Read
•In operation. operation.
case of a Write operation, the master places the data on the bus
along with the address and commands at time t0.
•The slave strobes the data into its input buffer 10
at time t2. 0
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.)
• Once the master places the device address and command on the bus, it
takes time for this information to propagate to the devices:
– This time depends on the physical and electrical characteristics of
the
bus.
• Also, all the devices have to be given enough time to decode the
address
and control signals, so that the addressed slave can place data on the
bus.
• Width of the pulse t1 - t0 depends on:
– Maximum propagation delay between two devices connected to
the bus.
10
1
– Time taken by all BITS
thePilani,
devices to todecode
Deemed theunder
be University address
Sectionand control
3 of UGC
Synchronous bus(contnd.)
• At the end of the clock cycle, at time t2, the master strobes the data on the
data lines into its input buffer if it’s a Read operation.
– “Strobe” means to capture the values of the data and store them into
a buffer.
• When data are to be loaded into a storage buffer register, the data should
be available for a period longer than the setup time of the device.
• Width of the pulse t2 - t1 should be longer than:
– Maximum propagation time of the bus plus
– Set up time of the input buffer register of
the master.
10
2
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.) Time
Address & Bus
command clock Data reaches
appear on Seen the master.
tAM
the bus. by
master
Address
and
Dat
comman
Address & d a
tDM
command
Seen by
reach the slave tAS
slave. Address Data
and appears
on the bus.
comman
d
Dat
a
tDS
t t 2
•Signals do not appear on
0
the bus as soon as they are placed on the
1
t
bus, due to the propagation delay in the interface circuits.
•Signals reach the devices after a propagation delay which
depends on the characteristics of the bus.
•Data must remain on the bus for some time after t2 equal to
10
the hold time of the buffer. 3
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.)
• Most buses have control signals to represent a
response from the slave.
• Control signals serve two purposes:
– Inform the master that the slave has recognized the address, and is ready to
participate in a data transfer operation.
– Enable to adjust the duration of the data transfer operation based on the
speed of the participating slaves.
• High-frequency bus clock is used:
– Data transfer spans several clock cycles instead of just one clock cycle as in the
earlier case.
10
4
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
bus(contnd.)
Slave-ready signal is an acknowledgement from the slave to the master to confirm that the
valid data has been sent. Depending on when the slave-ready signal is asserted, the duration
of the data transfer can change.
Address & Tim
command e
requesting a Read 1 2 3 4
operation appear
on the bus.
C loc
k
Addres
s
C omman
d Master strobes data
into the input buffer.
Dat
a
Slave-
ready
Slave places the data on the Clock changes are seen by all the
bus,
and asserts Slave-ready signal. devices at the same
time. 34
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous Timing
Diagram
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous
bus
⚫ Data transfers on the bus is controlled by a handshake between the master and the
slave.
⚫ Common clock in the synchronous bus case is replaced by two timing control lines:
⚫ Master-ready,
⚫ Slave-ready.
⚫ Master-ready signal is asserted by the master to indicate to the slave that it is ready
to participate in a data transfer.
⚫ Slave-ready signal is asserted by the slave in response to the master-ready from the
master, and it indicates to the master that the slave is ready to participate in a data
transfer.
36
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous
bus(contnd.) T im
e
Address
and
command
Master-
ready
Dat
a
S lave-
t0 t1 t2 t3 t4 t5
ready
Bus cycle
t0 - Master places the address and command information on the bus.
t1 - Master asserts the Master-ready signal. Master-ready signal is asserted
at t1 instead of t0 t2 - Addressed slave places the data on the bus and asserts
the Slave-ready signal.
t3 - Slave-ready signal arrives at the master.
t4 - Master removes the address and command information.
and the Slave-ready
t5 - Slave receives the signal from of the Master-ready signal from 1 to 0. It 37
transition
the bus. BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous vs.
Synchronous bus
• Advantages of asynchronous bus:
– Eliminates the need for synchronization between the sender and the receiver.
– Can accommodate varying delays automatically, using the Slave-ready signal.
• Disadvantages of asynchronous bus:
– Data transfer rate with full handshake is limited by two-round trip delays.
– Data transfers using a synchronous bus involves only one round trip delay, and
hence a synchronous bus can achieve faster rates.
38
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous Timing – Read
Diagram
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous Timing – Write
Diagram
BITS Pilani, Deemed to be University under Section 3 of UGC
PCI
⚫ Peripheral Component Interconnect Bus
⚫ Introduced in 1992
⚫ Low-cost bus
⚫ Processor independent
⚫ Plug-and-play capability
⚫ In today’s computers, most memory transfers involve a burst of data rather than just one
word. The PCI is designed primarily to support this mode of operation.
⚫ The bus supports three independent address spaces: memory, I/O, and configuration.
⚫ we assumed that the master maintains the address information on the bus until data transfer
is completed. But, the address is needed only long enough for the slave to be selected. Thus,
the address is needed on the bus for one clock cycle only, freeing the address lines to be used
for sending data in subsequent clock cycles. The result is a significant cost reduction.
⚫ A master is called an initiator in PCI terminology. The addressed device that responds to read
and write commands is called a target.
BITS Pilani, Deemed to be University under Section 3 of UGC
PCI Bus
Arbiter
PCI makes use of a centralized, synchronous arbitration
scheme
each master has a unique request (REQ) and grant
(GNT) signal
does not dictate a particular arbitration algorithm
a first-come-first-served approach
a round-robin approach
BITS Pilani, Deemed to be University under Section 3 of UGC
or some sort of priority scheme.
PCI Bus
Arbitration
devices A and B are IRDY# s/t/s Initiator
arbitrating Ready.
Driven by current bus
master (initiator of
transaction). During a
read, indicates that the
master is prepared to
accept data; during a
write, indicates that
valid data are present
on AD.
TRDY# s/t/s Target
R eady.
Driven by the target
(selected device).
During a read,
indicates that valid data
are present on AD;
during a write, indicates
that target is ready to
BITS Pilani, Deemed to be University under Section 3 of UGC
PCI Bus Arbitration between Two
Masters
a. At some point prior to the start of clock 1, A has asserted its RE Q
signal. The
arbiter samples this signal at the beginning of clock cycle 1.
b. During clock cycle 1, B requests use of the bus by asserting its
R E Q signal.
c. At the same time, the arbiter asserts GNT-A to grant bus access to
A.
d. Bus master A samples GNT-A at the beginning of clock 2 and
learns that it has been granted bus access. It also finds IRDY and
TRDY de-asserted, indicating that the bus is idle. Accordingly, it
asserts FRAME and places the address information on the address
bus and the command on the C/BE bus (not
115
shown). It also continues to assert
BITS REQ-A, because
Pilani, Deemed it has
to be University under a second
Section 3 of
PCI Bus Arbitration between Two
Masters
e. The bus arbiter samples all REQ lines at the beginning of clock 3 and makes an
arbitration
decision to grant the bus to B for the next transaction. It then asserts GNT-B and
deasserts
GNT-A. B will not be able to use the bus until it returns to an idle state.
f. A deasserts FRAME to indicate that the last (and only) data transfer is in
progress. It puts
the data on the data bus and signals the target with IRDY.The target reads the
data at the
beginning of the next clock cycle.
g. At the beginning of clock 5, B finds IRDY and FRAME deasserted and so is able
to take
control of the bus by asserting FRAME. It also deasserts its REQ line, because it
only wants to
perform one transaction.
116
Subsequently, master A is granted access to BITS
the Pilani,
bus for its next
Deemed transaction.
to be University under Section 3 of
Progra
m
https://yasmin-cpu-os-
simulator.software.informer.com/6.1/
46
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani Campus
Chapter 9
• Computer
Arithmetic
Arithmetic & Logic Unit (ALU)
Part of the computer that actually performs arithmetic
and logical operations on data
All of the other elements of the computer system are
there mainly to bring data into the ALU for it to process
and then to take the results back out
Based on the use of simple digital logic devices that can
store binary digits and perform simple Boolean logic
operations
BITS Pilani, Deemed to be University under Section 3 of UGC
ALU Inputs and
outputs
Fig
9.1
BITS Pilani, Deemed to be University under Section 3 of UGC
Integer Representation
In the binary number system arbitrary numbers can be
represented with:
▪The digits zero and one
▪The minus sign (for negative numbers)
▪The period, or radix point (for numbers with a fractional
component)
▪For purposes of computer storage and processing we do not
have the benefit of special symbols for the minus sign and
radix point
▪Only binary digits (0,1) may be used to represent numbers
BITS Pilani, Deemed to be University under Section 3 of UGC
Signed Nos 2’s
complement
representation
Eg +6=
0110
-6= 1010
BITS Pilani, Deemed to be University under Section 3 of UGC
Consider any binary number in 2’s
complement 101101
= -26-1.1 + 24.0 + 23.1 + 22.1 +
21.0+ 20.1
=-32 + 0 + 8 + 4 + 0 + 1= -19
The 2’s complement of 101101 is
010011= 19
Now if a 1 is appended then
1101101 ?
1101101= -27-1.1
+ 25.1 + 24.0 + 23.1 +
22.1+ 21.0 + 20.1
= -64+ 32 +0 +8+4+0+1
BITS Pilani, Deemed to be University under Section 3 of UGC
Addition and Subtraction
OVERFLOW RULE: If two numbers are added, and they are both positive or
both negative, then overflow occurs if and only if the result has the opposite
sign.
BITS Pilani, Deemed to be University under Section 3 of UGC
Addition and Subtraction
SUBTRACTION RULE: To subtract one number (subtrahend) from
another (minuend), take the twos complement (negation) of the
subtrahend and add it to the minuend.
BITS Pilani, Deemed to be University under Section 3 of UGC
Block Diagram of Hardware for
Addition and Subtraction
OF= Overflow bit
SW =Switch
(select addition
or subtraction)
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
1.Multiplication involves the generation of partial products, one for
each digit in the multiplier. These partial products are then
summed to produce the final product.
2.The partial products are easily defined. When the multiplier
bit is 0, the partial product is 0.When the multiplier is 1, the
partial product is the multiplicand.
3.The total product is produced by summing the partial products.
For this operation, each successive partial product is shifted one
position to the left relative to the preceding partial product.
4.The multiplication of two n-bit binary integers results in a product
of up to 2n bits in length (e.g., 11 X 11=1001).
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add First
cycl
e
0 0101 1110 1011 Shift
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
0 1101 1111 1011 Add
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
0 1101 1111 1011 Add 3rd cycle
0 0110 1111 1011 shift
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
0 1101 1111 1011 Add 3rd cycle
0 0110 1111 1011 shift
1 0001 1111 1011 Add
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
0 1101 1111 1011 Add 3rd cycle
0 0110 1111 1011 shift
1 0001 1111 1011 Add 4th cycle
0 1000 1111 1011 Shift 143
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of SIGNED INTEGERS
11 (1011) by 13 (1101) to get 143
(10001111)
-5(1011) times -3 (1101) equals -113
(10001111)
BITS Pilani, Deemed to be University under Section 3 of UGC
Booth Multiplier
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial
value
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
00011 00000 1 11000
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand 10 A-M
Comments
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
00011 00000 1 11000
11011 00000 1 11000 A-M
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand 10 A-M
Comments
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
00011 00000 1 11000
11011 00000 1 11000 A-M
11101 10000 0 11000
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand 10 A-M
Comments
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
00011 00000 1 11000
11011 00000 1 11000 A-M
11101 10000 0 11000
11110 11000 0 11000 shift
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand 10 A-M
Comments
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
00011 00000 1 11000
11011 00000 1 11000 A-M
11101 10000 0 11000
11110 11000 0 11000 shift
11110 11000 -40 -512+472 Final result
BITS Pilani, Deemed to be University under Section 3 of UGC
The basic grammar school division algorithm
Division Algorithm and Hardware
The Divisor register, ALU, and
Remainder register are all 64 bits wide,
Dividend Quotient x Divisor
Remainder
with only the Quotient register being
32 bits.
The 32-bit divisor starts in the left
half of the Divisor register and is
• shifted right 1 bit each iteration.
The remainder is initialized with the
dividend. Control decides when to shift
the Divisor and Quotient
• registers and when to write the new
• value into the Remainder register.
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
2a: shift the quotient register to
the left , seting up the new
rightmost bit to 1
it takes n + 1 steps to get the
proper quotient and remainder
BITS Pilani, Deemed to be University under Section 3 of UGC
Division 7/2
Shift left logical
(sll) BITS Pilani, Deemed to be University under Section 3 of UGC
9/4
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
42
9/4
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
43
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
15
5
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
15
6
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
15
7
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
3 1: Rem= Rem-Div 0000 0001 0000 1111 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0001 0000 0000 1001
3: Shift Div Right 0000 0000 1000 0000 1001
15
8
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
3 1: Rem= Rem-Div 0000 0001 0000 1111 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0001 0000 0000 1001
3: Shift Div Right 0000 0000 1000 0000 1001
4 1: Rem= Rem-Div 0000 0000 1000 0000 0001
2a: Rem>0 => sll Q, Q0=1 0001 0000 1000 0000 0001
3: Shift Div Right 0001 0000 0100 0000 0001
5 1: Rem= Rem-Div 0001 0000 0100 1111 1101
2b: Rem<0 => +Div, sll Q, Q0=0 0010 0000 0100 0000 0001
3: Shift Div Right 0010 0000 0010 0000 0001
15
9
Signed Division
Remember the signs of the divisor and dividend and then negate
the quotient if the signs disagree.
Dividend = Quotient Divisor + Remainder
Look at the example of
± 11 ÷ ±5
+11 = +2 𝑎𝑛𝑑 + −11 = −2 𝑎𝑛𝑑 −
+5 +5
1; 1
The 1st case is easy
+11 ÷ +5: Quotient = +2, Remainder = +1
Checking the results:
11 = 2 × 5 + (+1) = 10 + 1
BITS Pilani, Deemed to be University under Section 3 of UGC
Signed Division
Dividend = Quotient Divisor +
+11 −11
± 11 ÷ = +2 𝑎𝑛𝑑 + = −2 𝑎𝑛𝑑
Remainder
±5 +5 1;
+5 −1
2nd case If we change the sign of the dividend, the quotient must
change as well:
–11 ÷ +5: Quotient = –2
Rewriting our basic formula to calculate the remainder:
Remainder = (Dividend – Quotient × Divisor) = –11 – (–2 × +5)
= –11–(–10) = –1
So,
–11 ÷ +5: Quotient = –2, Remainder = –1
BITS Pilani, Deemed to be University under Section 3 of UGC
Signed Division
Dividend = Quotient Divisor +
+11 −11
± 11 ÷ = +2 𝑎𝑛𝑑 + = −2 𝑎𝑛𝑑
Remainder
±5 + 1; +
5 5 −1
Checking the results again:
–11 = – 2 5 + (–1) = – 10 – 1
The same result is obtained if
–11 = – 3 5 + (4) = – 15 + 4
absolute value of the quotient would then change
How to avoid this?
BITS Pilani, Deemed to be University under Section 3 of UGC
Signed Division
Dividend = Quotient Divisor +
± 11 ÷
Remainder +11 = +2 𝑎𝑛𝑑 + −11 = −2 𝑎𝑛𝑑 −
+5 +5
±5 1; 1
This anomalous behavior is avoided by following the rule that the
dividend and remainder must have the same signs, no matter what
the signs of the divisor and quotient.
+11
−2 𝑎𝑛𝑑 + +11 ÷ –5:
−5
1
−11
Quotient = –2,
= = +2 𝑎𝑛𝑑
−5 –11 ÷ –5:
Remainder = +1
−1
; Quotient = +2,
Remainder = –1
Thus the correctly signed division algorithm negates the quotient if
the signs of the operands are opposite and makes the sign of the
nonzero remainder match the dividend
BITS Pilani, Deemed to be University under Section 3 of UGC
Real
Numbers
Numbers with fractions
Could be done in pure
binary
▪1001.1010 = 24 + 20
+2-1 + 2-3 =9.625
Where is the binary
point? Fixed?
▪Very
limited
Moving? BITS Pilani, Deemed to be University under Section 3 of UGC
Floating
Point
• Normalized scientific notation: single non-zero
digit to the left of the decimal (binary) point –
example: 3.5 x 109
•1.010001 x 2-5two = (1 + 0 x 2-1 + 1 x 2-2 + … + 1
x 2-6) x 2-5ten
•A standard notation enables easy exchange of data
between machines and simplifies hardware
algorithms – the
IEEE 754 standard defines how floating point
numbers are represented
BITS Pilani, Deemed to be University under Section 3 of UGC
Sign and Magnitude
Representation
Sig Expone Fractio
1
n nt 8 n 23
bit
S bits E bits F
• More exponent bits wider range of numbers (not necessarily
more
numbers – recall there are infinite real
numbers)
• More fraction bits higher precision
• Register value = (-1)S x F x 2E
•Since we are only representing normalized numbers,
we are guaranteed that the number is of the form
1.xxxx.. BITS Pilani, Deemed to be University under Section 3 of UGC
Sign and Magnitude
Representation
Sig Expone Fractio
1
n nt 8 n 23
bit
S bitsE bitsF
• Largest number that can be
represented:
• Smallest number that can be represented:
0.000000001ten or 1.0ten × 109 (seconds in a
nanosecond) 3,155,760,000ten or 3.15576ten × 109
(seconds in a typical
century) BITS Pilani, Deemed to be University under Section 3 of UGC
Biased Exponent Representation
How to represent a signed exponent? Choices are …
Sign + magnitude representation for the exponent
Two’s complement representation
Biased representation
IEEE 754 uses biased representation for the exponent
Value of exponent
= val(E) = E – Bias (Bias is a constant)
Recall that exponent field is 8 bits for single precision
E can be in the range 0 to 255
E = 0 and E = 255 are reserved for special use
(discussed later)
E = 1 to 254 are used for normalized floating point numbers
Bias = 127 (half of 254), val(E) = E – 127
val(E=1) = –126, val(E=127)
BITS Pilani, Deemed= 0,University
to be val(E=254) =3 of127
under Section UGC
Biased Exponent –
Cont’d
Value of a Normalized Floating Point Number is
(–1) S × (1.F)2 × 2E – Bias
(–1) S × (1.f1f2f3f4 … ) 2 × 2E – Bias
(–1) S × (1 + f1×2-1 + f2×2-2 + f3×2-3 + f4×2-4 … ) 2 × 2E
– Bias
BITS Pilani, Deemed to be University under Section 3 of UGC
Examples of Single Precision
Float
What is the decimal value of this Single Precision float?
10111110001000000000000000000
000
Solution:
Sign = 1 is negative
Exponent = (01111100)2 = 124, E – bias = 124 –
127 = – 3
O Significand
Significand= (1.0100 … 0)
= (1.0100 …2 X0)22 –=
3 = 0.0010100=
1 + 2-2 = 1.25 (1. is
r –0.15625
implicit)
What is the decimal value of?
0 Value in decimal = –1.25 × 2 – 3 = –0.15625
100000100100110000000000000
0000
Solution: Implicit
Value in decimal = +(1.01001100 … 0)2 ×
2130–127
= (1.01001100 … 0)
BITS Pilani, Deemed to 2be× 23 =under
University (1010.01100
Section 3 of UGC … 0)2
Converting FP Decimal to
Binary
Convert –0.8125 to binary in single
precision
Solution:
=
Fraction bits 1.25can be obtained using multiplication by
0.8125 × 2== 0.51.625
0.625 × 2 = 1.0
0.8125
0.25=× (0.1101)2 = ½ + ¼ + 1/16 = 13/16
2
0.5 ×
Fraction 2
= (0.1101)2 = (1.101)2 × 2
– 1 (Normalized)
Stop when fractional part is 0
Exponent = – 1 + Bias = 126 (single precision)= 7E H
101111110101000000000000000
00000 BITS Pilani, Deemed to be University under Section 3 of UGC
Largest Normalized Float
What is the Largest normalized float?
Solution for Single Precision:
011111110111111111111111111
11111
Exponent – bias = 254 – 127 = 127 (largest
exponent for SP)
Significand = (1.111 … 1)2 = almost 2
Value in decimal ≈ 2 × 2127 ≈ 2128 ≈ 3.4028 … ×
1038
Overflow: exponent is too large to fit in the
exponent field BITS Pilani, Deemed to be University under Section 3 of UGC
Smallest Normalized Float
What is the smallest (in absolute value) normalized
float?
Solution for Single Precision:
00000000100000000000000000000
000
Exponent – bias = 1 – 127 = –126 (smallest
exponent for SP)
Significand = (1.000 … 0)2 = 1
Value in decimal = 1 × 2 –126 = 1.17549 … ×
10 –38
BITS Pilani, Deemed to be University under Section 3 of UGC
Zero, Infinity, and NaN
Zero
Exponent field E = 0 and fraction F = 0
+0 and – 0 are possible according to
sign bit S
Infinity
Infinity is a special value represented with maximum
E and
F=0
For single precision with 8-bit exponent: maximum E
= 255
Infinity can result from overflow or division by zero
+∞ and – ∞ are possible according to sign bit S
NaN (Not a Number)
NaN is a special value represented with maximum E
and BITS Pilani, Deemed to be University under Section 3 of UGC
Summary of IEEE 754
Encoding
BITS Pilani, Deemed to be University under Section 3 of UGC
Floating Point Addition
Example
Consider Adding (Single-Precision Floating-
Point):
+ 1.111001000000000000000102 × 24
+ 1.100000000000001100001012 × 22
Cannot add significands … Why?
Because exponents are not equal
How to make exponents equal?
Shift the significand of the lesser exponent
right
Difference between the two exponents = 4 – 2
=2
So, shift right second number by 2 bits and
increment exponent
1.100000000000001100001012 × 22 =
BITS Pilani, Deemed to be University under Section 3 of UGC
4
Floating-Point Addition –
cont'd
Now, ADD the Significands:
+ 1.11100100000000000000010 × 24
+ 1.10000000000000110000101 × 22
----------------------------------------------------------------------------------------------------------------------------- -
+ 1.11100100000000000000010 × 24
+ 0.01100000000000001100001 01 × 24 (shift right)
+10.01000100000000001100011 01 × 24 (result)
Addition produces a carry bit, result is NOT
normalized
Normalize Result (shift right and increment
exponent):
+ 10.01000100000000001100011 01 × 24
= + 1.00100010000000000110001 101 × 25
BITS Pilani, Deemed to be University under Section 3 of UGC
Rounding
Single-precision requires only 23 fraction bits
However, Normalized result can contain additional
bits
1.00100010000000000110001 | 1 01 × 25
Round Bit: R = 1
Sticky Bit: S = 1
Two extra bits are needed for rounding
Round bit: appears just after the normalized result
Sticky bit: appears after the round bit (OR of all
additional bits)
Since RS = 11, increment fraction to round to
nearest
1.00100010000000000110001 × 25
+1
----------------------------------------------
BITS Pilani, Deemed to be University under Section 3 of UGC
5
If the sign bits of the operands are different
Consider Adding:
+ 1.00000000101100010001101 × 2-6
– 1.00000000000000010011010 × 2-1
-------------------------------------------------------------------------------
+ 0.00001000000001011000100 01101 × 2-1 (shift right 5
bits)
– 1.00000000000000010011010 × 2-1
---------------------------------------------------------------------------------
0 0.00001000000001011000100 01101 × 2-1
1 0.11111111111111101100110 × 2-1 (2's
complement)
1 1.00001000000001000101010 01101 × 2-1 (ADD)
- 0.11110111111110111010101 10011 × 2-1 (2's
complement)
2's complement of result is required if result is
negative BITS Pilani, Deemed to be University under Section 3 of UGC
Subtraction FP
contnd.
+ 1.00000000101100010001101 × 2-6
– 1.00000000000000010011010 × 2-1
- 0.11110111111110111010101 10011 × 2-1 (result is
negative)
Result should be normalized
For subtraction, we can have leading zeros. To
normalize, count the number of leading zeros,
then shift result left and decrement the exponent
accordingly.
Guard bit
- 0.11110111111110111010101 1 0011 × 2-1
- 1.11101111111101110101011 0011 × 2-2 (Normalized)
Guard bit
Guard bit: guards against loss of a fraction bit
Needed for subtraction, when result has a leading
zero and should beBITSnormalized.
Pilani, Deemed to be University under Section 3 of UGC
Subtraction FP contnd.
Next, normalized result should be
rounded
Guard bit
011××2-12-2
0.11110111111110111010101 1 0 0011
- 1.11101111111101110101011
Round bit: R=0 (Normalized)
Sticky bit: S = 1
Since R = 0, it is more accurate to truncate the
result even if S = 1. We simply discard the extra
bits.
- 1.11101111111101110101011 0 011 × 2-2
(Normalized)
- 1.11101111111101110101011 × 2-2 (Rounded to
nearest) Exponent -2 is biased to -2+127= 125=
7D H
IEEE 754 Representation
BITS Pilani, Deemed to be University under Section 3 of UGC
of Result
Rounding to Nearest
Even
Normalized result has the form: 1. f1 f2 … fl R S
The round bit R appears after the last fraction bit
fl
The sticky bit S is the OR of all remaining
additional bits
Round to Nearest ResultEven:
is Exact, no need
default for rounding
rounding mode
Four cases for Truncate
RS: result by discarding RS
RS = 00 Increment result: ADD 1 to last
RS = 01 fraction bit
RS = 11 Tie Case (either truncate or
RS Check Lastincrement
= 10 fraction bitresult)
fl (f23 for single-precision or
f52 for double)
If fl is 0 then truncate result to keep fraction even
If fl is 1 then increment result to make fraction even
BITS Pilani, Deemed to be University under Section 3 of UGC
Floating Point Multiplication Example
Consider multiplying:
-1.110 1000 0100 0000 1010 00012 × 2–4
× 1.100 0000 0001 0000 0000 00002 × 2–2
Unlike addition, we add the
exponents of the operands
Result exponent value = (–4) + (–
2) = – 6
Using the biased representation: EZ
= EX + EY – Bias
EX = (–4) + 127 = 123 (Bias = 127
for single precision)
EY = (–2) + 127 = 125
EZ = 123 + 125 – 127 = 121
(value = –6)
Sign bit of product can be
BITS Pilani, Deemed to be University under Section 3 of UGC
FP Multiplication contnd.
Now multiply the
(Multiplicand)
significands: 1.11010000100000010100
(Multiplier) 001
×
111010000100000010100001
1.1000000000100000000
111010000100000010100001 0000
1.11010000100000010100001
10.101110001111101111110011001010000100
0000000000
24 bits × 24 bits 48 bits (double number
of bits)
Multiplicand × 0 = 0 Zero rows are eliminated
Multiplicand × 1 = Multiplicand (shifted left)
BITS Pilani, Deemed to be University under Section 3 of UGC
FP Multiplication contnd.
Normalize Product:
-10.10111000111110111111001100... × 2-6
Shift right and increment exponent because of carry
bit
= -1.010111000111110111111001100... × 2-5
Round to Nearest Even: (keep only 23 fraction
bits)
1.01011100011111011111100 | 1 100... × 2-5
Round bit = 1, Sticky bit = 1, so increment
fraction Final result = -
1.01011100011111011111101 × 2-5
IEEE 754 Representation
1 0 1 1 1 1 0 1 0 BITS
01 011100011111
Pilani, Deemed to be University under Section 3 of UGC
Examples
Final representation: (-1)S x (1 + Fraction) x 2(Exponent
– Bias)
• Represent -0.75ten in single and double-precision
formats
Single: (1 + 8 + 23)
Double: (1 + 11 + 52)
•What decimal number is represented by the
following single-precision number?
1 1000 0001 01000…0000
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani Campus
•DS Rao COA -IS ZC353
Objectives
Machine instructions and program
execution, including branching and
subroutine call and return operations.
Addressing methods for accessing
register and memory operands.
Assembly language for representing
machine instructions, data, and
programs.
Program-controlled Input/Output
operations. BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and
Operation n
bits
first word
Memory consists second
of many millions word
of storage cells, •
each of which •
•
can store 1 bit.
i th
Data is usually word
accessed in n- •
bit groups. n is •
•
called word
length. last
word
Figure Memory
BITS Pilani, Deemed to be 2.5.
University underwords.
Section 3 of UGC
Memory Location, Addresses, and Operation
32-bit word length example
32 bits
b3 b3
b1 b0
•
1 0
•
•
Sign bit:b31= 0 for positive numbers
b31= 1 for negative numbers
(a) A signed integer
8 bits 8 bits 8 bits 8 bits
ASCII ASCII ASCII ASCII
character character character
character
(b) Four characters
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and Operation
To retrieve information from memory, either for one
word or one byte (8-bit), addresses for each location
are needed.
A k-bit address memory has 2k memory locations,
namely 0 to 2k-1, called memory space.
24-bit memory: 224 = 16,777,216 = 16M (1M=220)
32-bit memory: 232 = 4G (1G=230)
1K(kilo)=210
1T(tera)=240
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and
Operation
It is impractical to assign distinct addresses to
individual bit locations in the memory.
The most practical assignment is to have
successive addresses refer to successive byte locations
in the memory – byte- addressable memory.
Byte locations have addresses 0, 1, 2, …
If word length is 32 bits, they successive words
are located at addresses 0, 4, 8 , …
BITS Pilani, Deemed to be University under Section 3 of UGC
Big-Endian and Little-Endian
Assignments
Big-Endian: lower byte addresses are used for the most significant bytes of
the word
Little-Endian: opposite ordering. lower byte addresses are used for the less
significant bytes of the word
Word
addres Byte Byte
s address address
0 0 1 2 3 0 3 2 1 0
4 4 5 6 7 4 7 6 5 4
• •
• •
• •
k k k k
k k k k k k
2 - 2 -4 2 -3 2- 2 - 1 2 - 2- 2 - 2 2 - 3 2 -4
4 2 4 1
(a) Big-endian (b) Little-endian
assignment assignment
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and
Operation
Address ordering of bytes
Word alignment
Words are said to be aligned in memory if they begin at a byte
address. that is a multiple of the number of bytes in a word.
– 16-bit word: word addresses: 0, 2, 4 , ….
– 32-bit word: word addresses: 0, 4, 8 , ….
– 64-bit word: word addresses: 0, 8,16,….
Memory Operation
Load (or Read or Fetch)
Copy the content. The memory content doesn’t change.
Address – Load
Registers can be
used Store (or Write)
Overwrite the
content in memory
Address and Data –
Store
BITS Pilani, Deemed to be University under Section 3 of UGC
“Must-Perform”
Operations
Data transfers between the memory and the processor
registers
Arithmetic and logic operations on data
Program sequencing and control
I/O transfers
Assembly Language Notation
Represent machine instructions and
programs.
Move LOC, R1 = R1←[LOC]
Add R1, R2, R3 = R3 ←[R1]+[R2]
or
Add R1, R2, R3 = R1 ←[R2]+[R3]
BITS Pilani, Deemed to be University under Section 3 of UGC
CPU
Organization
Single Accumulator
Result usually goes to the Accumulator
Accumulator has to be saved to
memory quite often
General Register
Registers hold operands thus reduce
memory traffic
Register book keeping
Stack
Operands and result are always in the
stack
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Opcode Operand(s) or Address(es)
Three-Address Instructions
▪ADD R1, R2, R3 R1 ← R2 + R3 or R3 ← R1
+ R2
Two-Address Instructions
▪ADD R1, R2 R1 ← R1 + R2
One-Address Instructions
▪ADD M AC ← AC + M[AR]
Zero-Address
▪ADD Instructions
TOS ← TOS + (TOS –
RISC Instructions 1)
▪Lots of registers. Memory is restricted to Load
& Store
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
Example: Evaluate (A+B)
(C+D)
Three-Address
1. ADD R1, A, B ; R1 ← M[A] +
2. ADD M[B]
R2, C, D
; R2 ← M[C] +
M[D]
3. MUL X, R1, R2 ; M[X] ← R1 R2
Example: Evaluate (A+B)
(C+D) Two-Address
1. MOV R1, A ; R1 ← M[A]
2. ADD R1, B ; R1 ← R1 +
M[B]
3. MOV R2, C ; R2 ← M[C]
4. ADD R2, D ; R2 ← R2 +
M[D]
5. MUL R1, R2 ; R1 ← R1 R2
6. MOV X, R1 ; M[X] ← R1
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Exampl Evaluate (A+B) (C+D) One-
e: 1. Address
LOAD A ; AC ← M[A]
2. ADD B ; AC ← AC +
3. STORE M[B]
T
4. LOAD ; M[T] ← AC
5. ADD C
; AC ← M[C]
6. MUL D
7. STORE ; AC ← AC +
T
M[D]
X
; AC ← AC M[T]
; M[X] ← AC
Example: Evaluate (C+D) RISC
(A+B)
1. LOAD R1, A ; R1 ← M[A]
2. LOAD R2, B ; R2 ← M[B]
3. LOAD R3, C ; R3 ← M[C]
4. LOAD R4, D ; R4 ← M[D]
5. ADD R1, R1, R2 ; R1 ← R1 + R2
6. ADD R3, R3, R4 ; R3 ← R3 + R4
7. MUL R1, R1, R3 ; R1 ← R1 R3
8. X, R1 ; M[X] ← R1
STORE BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Data
Types
8 (byte), 16 (halfword), 32 (word) bits
Halfword and word accesses should be word
aligned Nonaligned access alternatives
▪Default
– Treated as truncated
– Bits[1:0] treated as zero for word
– Bit[0] treated as zero for halfword
– Load single word instructions rotate right word aligned data
transferred by non word-aligned address one, two or three
bytes Alignment checking
▪Data abort signal indicates alignment fault for attempting
unaligned access
▪Unaligned access
▪Processor uses one or more memory accesses to generate
transfer of adjacent bytes transparently
BITS Pilani, to the
Deemed to be University underprogrammer
Section 3 of UGC
ARM Data Types
contnd.
Unsigned integer interpretation supported for all types
Twos-complement signed integer interpretation supported for
all types
Majority of implementations do not provide floating-point
hardware
▪Saves power and area
▪Floating-point arithmetic implemented in software
▪Optional floating-point coprocessor
▪Single- and double-precision IEEE 754 floating point data
types
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Operation Types
The ARM architecture provides a large collection of operation types
The following are the principal categories:
Load and store instructions: only load and store instructions access
memory
locations; arithmetic and logical instructions are performed only
on registers and immediate values encoded in the instruction
A Branch instructions: Allows a conditional branch forwards or
backwards up to
32 MB.
Data-processing instructions:includes logical instructions (AND,
OR, XOR), add and subtract instructions, and test and compare
instructions
Multiply instructions: The integer multiply instructions
operate on word or halfword operands and can produce normal
BITS Pilani, Deemed to be University under Section 3 of UGC
or long results
ARM Operation
Types contnd.
Parallel addition and subtraction instructions
Extend instructions: There are several instructions for
unpacking data
by sign or zero extending bytes to halfwords or words, and
halfwords to words.
Status register access instructions: ARM provides the ability
to read and
also to write portions of the status register.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Endian Support
E-bit in system control
register Under program
control
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Using
Registers
Registers are faster
Shorter instructions
– The number of registers is smaller (e.g. 32 registers
need 5 bits)
Potential speedup
Minimize the frequency with which data is moved back
and forth between the memory and processor registers.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Condition
Codes
Condition code flags
Condition code register / status
register N (negative)
Z (zero)
V
(overflow)
C (carry)
Different
instruction
s affect
different
flags BITS Pilani, Deemed to be University under Section 3 of UGC
Conditional Branch
Instructions
Example: A: 11110000
▪A: 1 1 1 1 0 0
00
▪B: 0 0 0 1 0 1
+(−B): 1 1 1 0 1 1 0
00 0
11011100
C=1 Z=0
S=1
V=0
Overflow can only happen when adding two numbers of the same sign
and getting a different sign. So, to detect overflow we don't care about
any bits except the sign bits. Ignore the other bits.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Conditions for Conditional Instruction
Execution
BITS Pilani, Deemed to be University under Section 3 of UGC
Expression Evaluation
Infix notation: A binary operator appears between the operands
𝑎 + (𝑏 × 𝑐), will yield a different result from (𝑎 + 𝑏) × 𝑐
Reverse Polish, or postfix,
notation a + b becomes a
b+
𝑎 + (𝑏 × 𝑐),
becomes a b c x +
(𝑎 + 𝑏) × 𝑐 becomes
ab+cx
Regardless of the complexity of an expression, no parentheses are
required when using reverse Polish
An expression in this form is easily evaluated using a stack
Rules
1. If the element is a variable or constant, push it onto the stack.
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
TOS= TOS (operation)
(TOS-1)
Example: Evaluate (A+B)
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B B
2. PUSH ; TOS ← (A + B) A
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
Example: Evaluate (A+B)
(C+D) Zero-Address
1. PUSH ; TOS ← A D
A ; TOS ← B C
2. PUSH ; TOS ← (A + B)
A+B
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Example: Evaluate (A+B)
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B C+D
2. PUSH ; TOS ← (A + B) A+B
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Example: Evaluate (A+B)
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B
2. PUSH ; TOS ← (A + B) (C+D)*(A+B)
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani | Dubai | Goa |
Hyderabad
Already included this in 8086 , students have to learn this on
their own
Addressing Modes
BITS Pilani, Deemed to be University under Section 3 of UGC
• 8086 Intel Microprocessor
Architecture
BITS Pilani, Deemed to be University under Section 3 of UGC
Generating Memory
Addresses
How to specify the
address of branch
target?
Can we give the memory operand address
directly in a single Add instruction in the
loop?
Use a register to hold the address of
NUM1; then increment by 4 on each pass
through the loop.
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Opcode Mode ...
Implied
▪AC is implied in “ADDM[AR]” in “One-
Address” instr.
▪TOS is implied in “ADD” in “Zero-
Address” instr. Immediate
▪The use of a constant in R1, 5”, i.e. R1
“MOV 5 ←
Register
▪Indicate which register
holds the operand
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Register Indirect
▪Indicate the register that holds the number of the register
that holds
the
MOVoperand
R1,
Autoincrement
(R2) / Autodecrement R1
▪Access & update in 1
instr. Direct Address R2 = 3
▪Use the given address
to access a memory R3 = 5
location
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Indirect Address
▪Indicate the memory location that holds the
address of the memory location that holds the
data
AR = 101
100
101 0 1 0 4
102
103
104 1 1 0 A
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Relative Address
▪EA = PC + Relative 0
Addr 1
PC = 2 2
+
100
AR = 100
101
Could be 102 1 1 0
Positive or A
Negative 103
(2’s 104
Complemen
t)
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Indexe
d
▪EA = Index Register + Relative
Addr
Useful with
“Autoincrement” XR = 2
or
“Autodecrement” +
100
AR = 100
101
102 1 1 0
Could be A
Positive or 103
Negative 104
(2’s
Complemen
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Base Register
▪EA = Base Register + Relative
Addr
Could be Positive or AR = 2
Negative
(2’s Complement)
+
100 0 0 0 5
BR = 100
101 0 0 1 2
102 0 0 0 A
Usually points to 103 0 1 0 7
the beginning of 104 0 0 5 9
an array
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Indexing
Methods
RISC machine, unlike a CISC
machine has simple and relatively
straightforward set of addressing
modes
ARM architecture departs somewhat
from this tradition by providing a
relatively rich set of addressing
modes.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Addressing Modes
Load/Store
Only instructions that reference memory
Indirectly through base register plus offset
Offset
Offset added to or subtracted from base register
contents to form the memory address
Preindex
Memory address is formed as for offset addressing
Memory address also written back to base register
So base register value incremented or decremented by
offset value BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Addressing Modes
Load/Store
Postindex
Memory address is base register value
Offset added or subtracted
Result written back to base register
Base register acts as index register for preindex and postindex
addressing
Offset either immediate value in instruction or another register
If register scaled register addressing available
Offset register value scaled by shift operator
Instruction specifiesBITS
shift size
Pilani, Deemed to be University under Section 3 of UGC
ARM Data Processing Instruction
Addressing & Branch
Instructions
Data Processing
Register addressing
– Value in register operands may be scaled using a
shift operator
Or mixture of register and immediate addressing
Branch
Immediate
Instruction contains 24 bit value
Shifted 2 bits left
– On word boundary
– Effective range +/-32MB from PC.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Load/Store
Multiple Addressing
Load/store subset of general-purpose registers
16-bit instruction field specifies list of registers
Sequential range of memory addresses
Increment after, increment before, decrement after, and
decrement
before
Base register specifies main memory address
Incrementing or decrementing starts before or after
first memory access
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Load/Store
Multiple Addressing
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
Layout of bits in an instruction
Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an instruction
set
Instruction Length
Affected by and affects:
Memory size
Memory organization
Bus structure
CPU complexity
CPU speed
Trade off between powerful instruction repertoire and
saving space BITS Pilani, Deemed to be University under Section 3 of UGC
Allocation
ofaddressing
Number of Bits
modes
Number of operands
Register versus memory
Number of register sets
Address range
Address granularity
BITS Pilani, Deemed to be University under Section 3 of UGC
In class Example-1
Assume an instruction set that uses a fixed 16-bit instruction length. Operand
specifiers are 6 bits in length. There are K two-operand instructions and L zero-
operand instructions. What is the maximum number of one-operand instructions?
BITS Pilani, Deemed to be University under Section 3 of UGC
In class Example-1
Design a variable-length opcode to allow all of the following to be encoded
in a 36-bit instruction:
• instructions with two 15-bit addresses and one 3-bit register number
• instructions with one 15-bit address and one 3-bit register number
• instructions with no addresses or registers
BITS Pilani, Deemed to be University under Section 3 of UGC
In class Example-1
Design a variable-length opcode to allow all of the following to be encoded
in a 36-bit instruction:
• instructions with two 15-bit addresses and one 3-bit register number
• instructions with one 15-bit address and one 3-bit register number
• instructions with no addresses or registers
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM
Instruction
Formats
S = For data processing instructions, updates condition codes
S = For load/store multiple instructions, execution restricted to
supervisor mode P, U, W = distinguish between different types of
addressing_mode
B = Unsigned byte (B==1) or word (B==0) access
L = For load/store instructions, Load (L==1) or Store (L==0)
BITS Pilani, Deemed to be University under Section 3 of UGC
L = For branch instructions, is return address stored in link register
Examples of Use of ARM Immediate
Contants
BITS Pilani, Deemed to be University under Section 3 of UGC
Expanding a Thumb ADD Instruction into its
ARM Equivalent
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
CPU simulator as example
BITS Pilani, Deemed to be University under Section 3 of UGC
How to access your Virtual lab
portal?
1. Login into your BITS WILP Elearn portal (https://elearn.bits-
pilani.ac.in/)
2. Click on "My Virtual Labs" under Important Links
Can also be downloaded from the below link and installed on your
PC, it is a free tool
Downloads – CPU-OS Simulator (teach-sim.com)
MSI Installer version for use by Windows Installer in silent mode.
Used to
distribute and install across several
systems. CPU-OS Simulator
7-5-50-msi.zip
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
s
1.locate the instruction, which is used to store one byte of data in a
memory location. Use it to store number 65 in address location 20 (all
numbers are in decimal). This is an example of direct addressing.
STB #65,20
2.Create an instruction to move decimal number 22 to register R01 and
make a note of it below. Execute this instruction and verify the result in
R01.
MOV #22,R01
3.Create an instruction to store decimal number 51 in
memory location the address of which is currently stored in register R01.
This is an example of indirect addressing. Note the use of the “@” prefix
next to R01 in this case.
STB #51,@R01
4. Make a note of what you see in data memory locations 20 and 22
BITS Pilani, Deemed to be University under Section 3 of UGC
3.Create an instruction to store decimal number
51 in memory location the address of which is
currently stored in register R01. This is an
example of indirect addressing. Note the use of
the “@” prefix next to R01 in this case.
STB #51,@R01
4.Make a note of what you see in data
memory locations 20 and 22
BITS Pilani, Deemed to be University under Section 3 of UGC
Creating A Loop
Now, let’s create a loop. First, enter the following code.
The # prefix is used to denote a literal value thus
distinguishing it from an address value which does not
use it.
MOV #0, R01
ADD #1, R01
CMP #5, R01
JNE 0
HLT
in order to make the code more flexible we can use
labels to represent instruction addresses
BITS Pilani, Deemed to be University under Section 3 of UGC
Naming the LOOP
Type label name L0 in the box next to the ENTER
LABEL
button in the window you use to enter instructions
Click the ENTER LABEL button
The new code should now look like this
(modifications are in red colour):
MOV #0, R01
L0:
ADD #1, R01
CMP #5, R01
JNE $L0
HLT
BITS Pilani, Deemed to be University under Section 3 of UGC
Elements of a machine instruction
Opcode
Operand
Direct
operand references
Eg: Add A 87H
or Add A B
Next Instruction
Instruction Set Design Issue
Type of Operation ?
Kind of Data ?
Instruction Format ?
Number of internal Registers in
CPU ?
Addressing Modes?
BITS Pilani, Deemed to be University under Section 3 of UGC
Questions on simulator
Type of Operation ?
Kind of Data ?
Instruction Format ?
Number of internal Registers in
CPU ?
Addressing Modes?
BITS Pilani, Deemed to be University under Section 3 of UGC
Assembler
Machines store and understand binary
instructions
E.g. N= I + J + K initialize I=2, J=3, K=4
Program starts in location 101
Data starting 201
Code:
Load contents of 201 into AC
Add contents of 202 to AC
Add contents of 203 to AC
Store contents of AC to 204
Tedious and error prone
BITS Pilani, Deemed to be University under Section 3 of UGC
Improvements
Use hexadecimal rather than binary
Code as series of lines
Hex address and memory address
Need to translate automatically using
program
Add symbolic names or mnemonics for
instructions
Three fields per line
Location address
Three letter opcode
If memory reference: address
Need more complex translation program
BITS Pilani, Deemed to be University under Section 3 of UGC
Program
in: Binary Hexadecim
al
Address Contents Address Contents
101 0010 0010 101 2201 101 2201
102 0001 0010 102 1202 102 1202
103 0001 0010 103 1203 103 1203
104 0011 0010 104 3204 104 3204
201 0000 0000 201 0002 201 0002
202 0000 0000 202 0003 202 0003
203 0000 0000 203 0004 203 0004
204 0000 0000 204 0000 204 0000
BITS Pilani, Deemed to be University under Section 3 of UGC
Symbolic
Addresses
First field (address) now symbolic
Memory references in third field now symbolic
Now have assembly language and need an assembler to
translate
Assembler used for some systems programming
▪Compliers
▪I/O routines
BITS Pilani, Deemed to be University under Section 3 of UGC
Computation of the
Formula N = I + J + K
BITS Pilani, Deemed to be University under Section 3 of UGC
Adressing Mode In class
example
Given 3values and a one-address machine with
the following memory
an accumulator, what values do the following instructions load into
the accumulator?
• Word 20 contains 40.
• Word 30 contains 50.
• Word 40 contains 60.
• Word 50 contains 70.
a. LOAD IMMEDIATE 20
b. LOAD DIRECT 20
c. LOAD INDIRECT 20
d. LOAD IMMEDIATE 30
e. LOAD DIRECT 30
f. LOAD INDIRECT 30
a.
Sol:2
0 d. 30 e. 50 f .
b. 4 70
0
c. 6
0
BITS Pilani, Deemed to be University under Section 3 of UGC
In class example 4
Consider a 16-bit processor in which the following appears in main memory, starting at
location 200:
The first part of the first word indicates
that this instruction loads a value into
an accumulator.
The Mode field specifies an addressing mode and, if appropriate, indicates a source register;
assume that when used, the source register is R1, which has a value of 400.There is also a
base register that contains the value 100.The value of 500 in location 201 may be part of the
address calculation. Assume that location 399 contains the value 999, location 400 contains
the value 1000, and so on. Determine the effective address and the operand to be loaded for
the following address modes:
BITS Pilani, Deemed to be University under Section 3 of UGC
Contnd.
Determine the effective address and the operand to be
loaded for the following address modes
a. Direct d. PC relative g. Register indirect
b. Immediate e. Displacement,
c. Indirect f. Register
h. Autoindexing with increment using R1
Mode→ R1 and contains A 500 1100
B 201 500
400 Base register 100
C 1100 1700
Adrs 399 → 999 & 400→ D 702 1302
1000 E 600 1200
F R1 400
G 400 1000
H 400 1000
BITS Pilani, Deemed to be University under Section 3 of UGC
In class example 5
Assume a stack-oriented processor that includes the stack operations PUSH
and POP. Arithmetic operations automatically involve the top one or two
stack elements. Begin with an empty stack.What stack elements remain
after the following instructions
are
executed?
PUSH 7 Push 4 4
Push 7 7,4
PUSH 8
4 Push 8 8,7,4
ADD ADD 15,4
PUSH 10,15,
PUSH 10 SUB 4
10 SUB MUL 5,4
20
MUL
BITS Pilani, Deemed to be University under Section 3 of UGC
In class example 6
Compare zero-, one-, two-, and three-address machines by writing
programs to compute
for each of the four machines.The instructions available for use are
as follows:
X = (A + B X C) ∕ (D - E X F)
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
BITS Pilani, Deemed to be University under Section 3 of UGC
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
C
B
A
X = (A + B X C) ∕ (D - E
X F)
BITS Pilani, Deemed to be University under Section 3 of UGC
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
BXC
A
X = (A + B X C) ∕ (D - E
X F)
BITS Pilani, Deemed to be University under Section 3 of UGC
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
F
E
D
A+BXC
X = (A + B X C) ∕ (D - E
X F)
BITS Pilani, Deemed to be University under Section 3 of UGC
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
ExF
D
A+BXC
X = (A + B X C) ∕ (D - E
X F)
BITS Pilani, Deemed to be University under Section 3 of UGC
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
D-ExF
A+BXC
X = (A + B X C) ∕ (D - E
X F)
BITS Pilani, Deemed to be University under Section 3 of UGC
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M
A+BxC /
D-ExF
X = (A + B X C) ∕ (D - E
X F)
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani Campus
DSRao
COA -IS ZC353
Memory
Characteristics of memory
systems
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical
characteristics
BITS Pilani, Pilani
The Big Picture: Where are We
Now?
• The Five Classic Components of a Computer
• Memory is usually implemented as:
– Dynamic Random Access Memory (DRAM) - for main
memory
– Static Random Access Memory (SRAM) - for cache
Processor
Input
Control
Memory
Datapath Output
BITS Pilani, Deemed to be University under Section 3 of UGC
Technology
Trends
Capacity Speed
(latency)
Logic: 2x in 3 2x in 3 years
years
DRAM: 4x in 3 2x in 10 years
years
Disk: 4x in 3DRAM 2x in 10 years
years
Year Size Cycle Time
1980 1000:1! 64 Kb 2:1! 250 ns
1983 256 Kb 220 ns
1986 1 Mb 190 ns
1989 4 Mb 165 ns
1992 16 Mb 145 ns
1995 64 Mb 120 ns
1998 256 Mb 100 ns
2001 1 Gb 80 ns
BITS Pilani, Deemed to be University under Section 3 of UGC
Who Cares About
Memory?
Processor-DRAM Memory Gap (latency)
1000 µProc CPU
60%/yr.
“Moore’s Law”
(2X/1.5yr)
Performance
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
9%/yr.
1 (2X/10 yrs)
1980
1981
1982
1983
1984
1985
2000
1986
1987
1989
1990
1991
1992
1993
1994
1995
1997
1998
1999
1988
1996
Time
BITS Pilani, Deemed to be University under Section 3 of UGC
Locatio
n
• Internal (main)
– CPU : In the form of registers, control unit
memory
– Cache , main memory
• External(secondary) CONTROL UNIT
– Disk andtapes Sequencing
logic
Control unit
registers
and
decoders
CPU
Control
ALU memory
Registers
CPU
Internal CPU
Interconnection
(Bus)
Control unit
BITS Pilani, Pilani
Capacit
y
• Number of bytes or words
• Word size
– The natural unit of organisation
• Common word lengths are 8,16,
and 32 bits
BITS Pilani, Pilani
Unit of
Transfer
• Internal
– Usually governed by data bus width
• Three related concepts for internal memory
– Word : The size of the word is typically equal to the
number of bits
used to represent an integer and to the instruction length.
» Exceptions:
• CRAY C90 has a 64-bit word length but uses a 46-bit integer
representation.
• The Intel x86 architecture has a wide variety of instruction
lengths, expressed as multiples of bytes, and a word size of 32
bits.
– Addressable units (Address space) : 2A = N
» Addressability : How many bits per location?
» Total Size= Addressability * Addressable unit
– Unit of transfer : Word or block
• External
– Usually a block which is much larger than a word
BITS Pilani, Pilani
Access Methods
(1)
• Sequential
– Start at the beginning and read through in
order
– Access time depends on location of data and
previous location
– e.g. tape
• Direct
– Individual blocks have unique address
– Access is by jumping to vicinity plus sequential
search
– Access time is variable i.e. access time
depends on location and previous location
BITS Pilani, Pilani
Access Methods
(2)
Random
– Addressable locations are identified by individual
addresses
– Access time is independent of location or previous
access
– e.g. RAM
Associative
– A word is retrieved based on a portion of its
content rather than its address.
– Access time is independent of location or previous
access
– e.g. cache
BITS Pilani, Pilani
Performanc
e
Access time or Latency
– Time between presenting the address and
getting the valid data
Memory Cycle time
– Memory Cycle time = access time +
Additional time
– Additional time may be required for transients
to die out on signal lines or to generate data if
they are read destructively
Transfer Rate
– Rate at which data can be transferred into
or out of a memory unit
BITS Pilani, Pilani
Physical
Types
• Semiconduct
or
– RAM
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram
BITS Pilani, Pilani
Physical
Characteristics
• Volatile vs Non-
volatile
• ROM vs RAM
BITS Pilani, Pilani
The Bottom
Line
• How much?
– Capacity
• How fast?
– Time is money
• How
expensive?
BITS Pilani, Pilani
The Bot om Line
….
• Faster access time , cost per
bit
• Greater capacity, cost per bit
• Greater capacity, access
time
BITS Pilani, Pilani
The Bot om Line
….
• Faster access time , greater cost
per bit
• Greater capacity, smaller cost per
bit
• Greater capacity, slower access
time
BITS Pilani, Pilani
Memory
Hierarchy
Memory technology Typical access time $ per GB in 2004
SRAM 0.5–5 ns $4000–$10,000
DRAM 50–70 ns $100–$200
Magnetic disk 5,000,000–20,000,000 ns $0.50–$2
CPU
Processor
Increasing
Level
distance from
1
the CPU in
access time
Levels in the Level 2
memory
hierarchy Data are
transferred
Level
n
Size of the memory at each
level
BITS Pilani, Deemed to be University under Section 3 of UGC
Hierarchy
List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Disk cache
• Disk
• Optical
• Tape
down the hierarchy,
• Increasing capacity
• Decreasing cost per bit
• Increasing access time
• Decreasing frequency of access
of the memory by the processor
BITS Pilani, Pilani
Memory Hierarchy: How Does it
Work?
Temporal Locality (Locality in Time):
Keep most recently accessed data items closer to the
processor
e.g., instructions in a loop
Spatial Locality (Locality in
Space):
Move blocks consists of
contiguous words to the upper
levels Lower Level
To instruction Upper Level
e.g., sequential Memory
Processor
access, array data Memory
Blk X
From Processor Blk Y
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Hierarchy:
Terminology
Hit: data appears in some block in the upper level (example: Block X)
Hit Rate: the fraction of memory access found in the upper level
Hit Time: Time to access the upper level which
consists of RAM access time + Time to
determine hit/miss
Miss: data needs to be retrieve from a block in the
lower level (Block Y)
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level + Time
to deliver the block the processor
Hit Time << Miss Penalty Lower Level
To Upper Level Memory
Processor Memory
Blk X
From Processor Blk Y
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Hierarchy of a Modern Computer
System
By taking advantage of the principle of locality:
▪Present the user with as much memory as is available in the cheapest
technology.
▪Provide access at the speed offered by the fastest technology.
Processor
Control
Secondary Tertiar
Storage y
Second Main (Disk) Storage
Level Memory (Tape)
Registers
Datapath On-
Cache
Chip
Cache
(SRAM) (DRAM)
Speed (ns): 1s 10s 100s 10,000,000s 10,000,000,000s
(10s ms) (10s sec)
Size (bytes): 100s Ks Ms Gs Ts
BITS Pilani, Deemed to be University under Section 3 of UGC
General Principles of
Memory
Locality
Temporal Locality: referenced memory is likely to be referenced again soon (e.g. code within a loop)
Spatial Locality: memory close to referenced memory is likely to be referenced soon (e.g., data in a
sequentially access array)
Definitions
Upper: memory closer to processor
Block: minimum unit that is present or not present
Block address: location of block in memory
Hit: Data is found in the desired location
Hit time: time to access upper level
Miss rate: percentage of time item not
found in upper level
Locality + smaller HW is faster = memory
hierarchy
Levels: each smaller, faster, more
expensive/byte than level below BITS Pilani, Deemed to be University under Section 3 of UGC
In class Example
1
Suppose that the processor has access to two levels of memory. Level 1 contains
1000 words and has an access time of 0.01µ s; level 2 contains 100,000 words and
has an access time of 0.1µ s. Assume that if a word to be accessed is in level 1,
then the processor accesses it directly. If it is in level 2, then the word is first
transferred to level 1 and then accessed by the processor.
For simplicity, we ignore the time required for the processor to determine whether
the word is in level 1 or level 2.
Figure next shows the general shape of the curve that covers this situation. The
figure shows the average access time to a two-level memory as a function of the
hit ratio H, where H is defined as the fraction of all memory accesses that are
found in the faster memory (e.g., the cache), T1 is the access time to level 1, and
T2 is the access time to level 2.
As can be seen, for high percentages of level 1 access, the average total access
time is much closer to that of level 1 than that of level 2
BITS Pilani, Deemed to be University under Section 3 of UGC
In class example2
In our example, suppose 95% of the
memory accesses are found in the
cache. Then the average time to access
a word can be expressed as
(0.95)(0.01 µs) + (0.05)(0.01 µs + 0.1
µs)
= 0.0095 + 0.0055 = 0.015 µs
The average access time is much
closer
to 0.01µ s than to 0.1 µs, as desired
Performance of accesses
involving only level 1 (hit ratio)
BITS Pilani, Deemed to be University under Section 3 of UGC
Cach
e
Small amount of fast memory
Sits between normal main memory and
CPU May be located on CPU chip or
module
BITS Pilani, Deemed to be University under Section 3 of UGC
Cache operation - overview
CPU requests contents of memory
location
Cache Design
Check cache for this data Size
If present, get from cache (fast) Mapping Function
Replacement
If not present, read required block
Algorithm
from main memory to cache Write Policy
Then deliver from cache to CPU
Block Size
Cache includes tags to identify
Number of Caches
which block of main memory is in
each cache slot
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Cache/Main Memory
Structure
BITS Pilani, Pilani
Cache Read Operation -
Flowchart
Cache Misses - Read
• The control unit
1. must detect a miss
2. Stall the entire
processor
3. fetch the requested
data
from memory
• Steps taken by the control
unit on an instruction cache
miss
1. Send PC-4 to memory
2. Start reading the
main memory
3. Transferring block to
the cache
4. Restart the
instruction to start
fetching from the
BITS Pilani, Pilani
Read
Hit
BITS Pilani, Pilani
Read
Miss
Miss Penalty BITS Pilani, Pilani
Cache Misses -
Writes
On an instruction cache miss, we
Index the cache using bits 15-2 of the address
Write the cache entry
Data from processor is placed in data portion
Bits 31-16 of address written into tag field
Turn valid bit on
Write the word to main memory using the entire address
How do we keep main memory up to date ?
Write-through means writing to both cache and main
memory when a miss occurs (to avoid inconsistent or
untrue memories)
Write buffer uses a fast and small memory to store the
cache writes while it is waiting to be written out to
memory (in MIPS, it is 4 words)
Write-back means writing out to main memory only when
the block is swapped out
BITS Pilani, Pilani
Write
hit
BITS Pilani, Pilani
Write
miss
BITS Pilani, Pilani
Memory Organizations (1)
In part a, all components are one word wide
In part b, a wider memory, bus and cache are utilized
In part c, interleaved memory banks with a narrow bus
and cache are utilized
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Organizations
(2)
Assume that it takes
▪ 1 clock cycle to send the referenced address
▪ 15 clock cycles for each DRAM access initiated
▪ 1 clock cycle to send a word of data
In part a, we have a 1 + (4 x 15) + (4 x 1) = 65 clock cycle miss
penalty, and a
(4x4)/65 = 0.25 bytes per clock cycle
In part b, we have a 1 + (2 x 15) + (2 x 1) = 33 clock cycle miss
penalty, and a
(4x4)/33 = 0.48 bytes per clock cycle (assuming memory width of
two words)
▪ Wider bus and higher cache access time
In part c, we have a 1 + (1x15) + (4 x 1) = 20 clock cycle miss
penalty, and a BITS Pilani, Deemed to be University under Section 3 of UGC
Four Questions for Memory Hierarchy
Designers
Q1: Where can a block be placed in the upper level?
(Block placement)
Q2: How is a block found if it is in the upper level?
(Block identification)
Q3: Which block should be replaced on a miss?
(Block replacement)
Q4: What happens on a write?
(Write strategy)
BITS Pilani, Deemed to be University under Section 3 of UGC
Q1: Where can a block be
placed?
Direct Mapped: Each block has only one place
that it can appear in the cache.
Fully associative: Each block can be placed
anywhere in
the cache.
Set associative: Each block can be placed in a
restricted set of places in the cache.
If there are n blocks in a set, the cache placement
is called n-way set associative
What is the associativity of a direct mapped cache?
BITS Pilani, Deemed to be University under Section 3 of UGC
Locating data in the
Cache 31 30
0
13 12 11 21
Index is 10 bits, while tag is 20 e
Byt
offse
1
bits Hi
t
Ta
2
0 0
t
Dat
a
g
▪We need to address 1024 (210) words Inde
x
▪We could have any of 220 words per Index Valid Dat
a
Tag 0
cache 1
2
location
Valid bit indicates whether an
102
entry contains a valid 1
102
address or not Tag bits is 2
102
2
0
3
2
3
usually indicated by
address size – (log2(memory size) +
2)
▪E.g. 32 – (10 + 2) = 20
BITS Pilani, Deemed to be University under Section 3 of UGC
Elements of Cache
Design
Cache Addresses
Logical (also known as a virtual
cache)
Physical Write Policy
Cache Size
Write through
Mapping Function
Write back
Direct
Write once
Associative
Line Size
Set Associative
Replacement Algorithm Number of caches
Least recently used (LRU) Single or two level
First in first out (FIFO) Unified or split
Least frequently used (LFU)
Random
BITS Pilani, Deemed to be University under Section 3 of UGC
Size does
matter
Cost
• Small enough so that the overall average cost per bit is
close to that of main memory alone.
Speed
• Large enough so that the overall average access time
is close to
that of cache alone.
BITS Pilani, Pilani
Direct mapped cache:
Example Main Memory
• 16 Bytes byte
addressable main 0 0 00
memory 0 0 01
• How many 0 0 10
address bits are 0 0 11
required?
0 1 00
• Memory
– How block
many size
cacheis 4 0 1 01
bytes
lines? 0 1 10
• Cache
eacof 8isByte
– cache 2 lines of 4 0 1 11
bytes
h
Cache Memory
1 0 00
1 0 01
Line 0 1 0 10
1 0 11
1 1 00
Line 1 1 1 01
1 1 10
1 1 11
BITS Pilani, Pilani
Cache Memory Main Memory
0 0 00
Line 0 0 0 01
Block 0
Line 1 0 0 10
0 0 11
0 1 00
0 1 01 Block 1
0 1 10
0 1 11
1 0 00
1 0 01
1 0 10 Block 2
1 0 11
1 1 00
1 1 01
1 1 10
1 1 11 Block 3
BITS Pilani, Pilani
Direct
Mapping
• Each block of main memory maps to only one
cache line
– i.e. if a block is in cache, it must be in one specific
place
– i = j modulo m
– where i = cache line number
– j = main memory block number
– m = number of lines in the cache
• Address is split in two parts
• Least Significant w bits identify unique word
• Most Significant s bits specify memory block
• The MSBs are split into a cache line field r and
a tag of
BITS Pilani, Pilani
Direct Mapping Cache
Organization
BITS Pilani, Pilani
Direct mapping-
Summary
• Address length = (s+w) bits
• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w /
2w = 2s
• Number of lines in cache = m = 2r
• Size of tag =(s-r) bits
BITS Pilani, Pilani
Direct mapping
Summary
1 1 2
Tag s-r Line or Slot r Word w
BITS Pilani, Pilani
Measuring and Improving
Cache Performance
cache miss: A request for data from the cache that cannot be fulfilled
because the data is not present in the cache
CPU time =
(CPU execution clock cycles + Memory-stall clock cycles) × Clock
cycle time
Assume the miss rate of an instruction cache is 3% and
the miss rate of the data cache is 5%. If a processor
has a CPI of 3 without any memory stalls and the miss
penalty is 120 cycles for all misses, . Assume the
frequency of all loads and stores is 40%. Determine
How much faster a processor would run with a
perfect cache that never missed
BITS Pilani, Deemed to be University under Section 3 of UGC
Assume the miss rate of an instruction cache is 3% and
the miss rate of the data cache is 5%. If a processor
has a CPI of 3 without any memory stalls and the miss
penalty is 120 cycles for all misses, . Assume the
frequency of all loads and stores is 40%. Determine
How much faster a processor would run with a
perfect cache that never missed
Total Miss penalty = Ic X(0.03 X 0.6 X 120 + 0.05 X 0.4 X 120)
X 100
= (2.16+2.4)Ic=4.56Ic
For Perfect cache execution time= 3 X Ic = 3Ic
Total with imperfect= 3 X 0.92Ic + 4.56Ic = 2.76Ic + 4.56Ic =
7.32Ic
Speed up = 7.32/3= 2.44
BITS Pilani, Deemed to be University under Section 3 of UGC
Direct Mapped cache
Cache always smaller
(Block address) modulo (Number of blocks in the cache)
BITS Pilani, Deemed to be University under Section 3 of UGC
Exampl
e
• 32-word memory
Inde Vali Tag Data
• 8-word cache x d
000
• (The addresses 001
010
011
below are word 100
101
addresses.) 110
111
Address Binary Cache Hit or miss
block
22 10110 110
26 11010 010
22 10110 110
26 11010 010
16 10000 000
3 00011 011
16 10000 000
18 10010 010
BITS Pilani, Deemed to be University under Section 3 of UGC
Example:
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS Pilani, Deemed to be University under Section 3 of UGC
How many total bits are required for a directed mapped
cache with four byte blocks, if the cache contains 64 Kbytes of data and the
address length of the
processor is 32 bits? Assume that byte addressing is used.
With four byte blocks, the 64 Kbyte cache contains a total of 64K/4 = 16K blocks. The
cache has the following parameters
(a) Byte select size: 2 bits (since 22 = 4 bytes/block)
(b) Cache index size: 14 bits (since 214 = 16K blocks)
(c) Cache tag size: 16 bits (remaining bits from 32)
(d) Block size: 32 bits (4 bytes)
(e)Number of blocks: 16K blocks (cache size/block size)
Assuming that the cache has a valid bit, a cache tag, and
data for each block, the size of the cache is:
cache bits = number of blocks x (block size + tag size + 1)
= 214 x (32 + 16 + 1)
= 802,816 bits
BITS Pilani, Deemed to be University under Section 3 of UGC
How many bits are required if the block size is 32
bytes? if the cache
contains 64 Kbytes of data and the address length of the processor is 32
bits
With 32 byte blocks, the 64 Kbyte cache contains a total of 64K/32 = 2K blocks.
The cache has the following parameters
(a) Byte select size: 5 bits (since 25 = 32 bytes/block)
(b) Cache index size: 11 bits (since 211 = 2K blocks)
(c) Cache tag size: 16 bits (remaining bits from 32)
(d) Block size: 256 bits (32 bytes)
(e) Number of blocks: 2K blocks (cache size/block size)
cache bits = number of blocks x (block size + tag size + 1)
= 211 x (256 + 16 + 1)
= 559,104 bits
Note: Increasing the block size tends to decrease the number of bits needed to implement
the cache
BITS Pilani, Deemed to be University under Section 3 of UGC
Miss rate versus block
size
BITS Pilani, Deemed to be University under Section 3 of UGC
Mapping an Address to a Multiword Cache Block
Consider a cache with 32 blocks and a block size of 8 bytes. To what
block number does byte address 600 map?
Explore?
𝐵𝑦𝑡𝑒
Address of the block is 𝐴𝑑𝑑𝑟𝑒𝑠𝑠
𝐵𝑦𝑡𝑒𝑠 𝑝𝑒𝑟
This block contains all 𝑏𝑙𝑜𝑐𝑘of 8 bytes
addresses
𝐵𝑦𝑡𝑒
𝐴𝑑𝑑𝑟𝑒𝑠𝑠 × 𝐵𝑦𝑡𝑒𝑠 𝑝𝑒𝑟
𝐵𝑦𝑡𝑒 𝑝𝑒𝑟
𝑏𝑙𝑜𝑐𝑘
60
= 75 𝑏𝑙𝑜𝑐𝑘
08 is the block address
This maps to (75 modulo 32) = 11
In fact this contains all address between 600 to 607
BITS Pilani, Deemed to be University under Section 3 of UGC
Direct mapping
Summary
Cache Line Main Memory blocks assigned
0 0,m,2m,…., 2s-m
1 1, m+1, 2m+1…..,2s -m+1
. .
. .
. .
m-1 m-1,2m-1,…., 2s-1
BITS Pilani, Pilani
Analyse Directly Mapped cache organization using different cache
size and
block size : Explain what cache hit and miss rates are
Write a new program in the CPU simulator and name it as call it
Cache understanding1
MOV #0, R02
STB R02, @R02
CMP #63, R02
JEQ 28
INR R02
JMP 6
HLT
Question: Can you think of another instruction in place of INR R02?
It can be ADD #01, R02.
JEQ 28 instruction?
BITS Pilani, Deemed to be University under Section 3 of UGC
Configure the cache
Block Size = 4
Cache Type = Direct Mapped
Cache Size = 16
Write Policy = Write-Back
Change Block size
=2
BITS Pilani, Deemed to be University under Section 3 Act,
Change Block size =8
Insert Instruction LDB 9,
R08
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Reducing Cache Misses by More Flexible Placement of Blocks
Fully associative cache:
A cache structure in which a block can be placed in any location
in the
cache.
Set-associative cache:
A cache that has a fixed number of locations (at least two)
where each block can be placed.
A set-associative cache with n locations for a block is called an
n-way set- associative cache
BITS Pilani, Deemed to be University under Section 3 of UGC
Associative
Caches
Fully associative
▪Allow a given block to go in any cache
entry
▪Requires all entries to be searched at
once
▪Comparator per entry (expensive)
n-way set associative
▪Each set contains n entries
▪Block number determines which set
– (Block number) modulo (#Sets in
cache)
▪Search all entries in a given set at
once
▪n comparators (lessBITS
expensive)
Pilani, Deemed to be University under Section 3 of UGC
Associativity Examples
Cache size is 8 blocks
Where does word 12 from memory
go?
Fully associative:
Block 12 can go anywhere
Direct mapped:
Block no. = (Block address) mod
(No. of blocks in
cache)
Block 12 can go only into
block 4 (12 mod 8 = 4)
=> Access block using lower
3 bits
2-way set associative:
Set no. = (Block address)
mod
(No. of sets in
cache) Block 12 can go
BITS Pilani, Deemed to be University under Section 3 of UGC
anywhere in set 0 (12 mod 4
The location of a memory block whose
address is 12
Direct-mapped cache
(Block number) modulo (Number of blocks in the cache)
In direct-mapped placement, there is only one cache
block where memory block 12 can be found, and that
block is given by (12 modulo 8) 4.
Set-associative cache, set number
(Block number) modulo (Number of sets in the
cache)
In a two-way set-associative cache, there would be
four
sets, and memory block 12 must be in set (12 mod
4) 0;
BITS Pilani, Deemed to be University under Section 3 of UGC
The location of a memory block whose
address is 12
Set-associative cache, set number
(Block number) modulo (Number of sets in the
cache)
In a fully associative placement,
the memory block for block address 12 can
appear in any of the eight cache blocks.
BITS Pilani, Deemed to be University under Section 3 of UGC
A direct-mapped cache :a one-way set-associative cache:
A fully associative: cache with m entries is simply an m-way set-
associative cache; it
has one set with m blocks, and an entry can reside in any block within
that set BITS Pilani, Deemed to be University under Section 3 of UGC
BITS Pilani, Pilani
Replacement
Policy
• Direct mapped: no choice
• Set associative
– Prefer non-valid entry, if there is one
– Otherwise, choose among entries in the set
• Least-recently used (LRU)
– Choose the one unused for the longest time
• Simple for 2-way, manageable for 4-way, too hard
beyond that
• Random
– Gives approximately the same performance as LRU for
high
associativity
BITS Pilani, Deemed to be University under Section 3 of UGC
Associativity
Example
• Compare 4-block caches
Block Cache block
– Direct mapped, 2-way set Adrs
associative, fully associative 0 (0 modulo 4)=0
– Block access sequence: 0, 8, 0, 6 (6 modulo 4)=2
6, 8
8 (8 modulo 4)=0
• Direct mapped
Block Cach Hit/miss Cache content after access
addres e 0 1 2 3
s inde
x
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
Assume there8 are three small
0 caches, each consisting
miss Mem[8]of four one-wordMem[6]
blocks. One cache is fully
associative, a second is two-way set-associative, and the third is direct-mapped. Find the number of
misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, and 8.
BITS Pilani, Deemed to be University under Section 3 of UGC
Associativity
Example
• 2-way set associative : Two way means there are two sets with indices 0
and 1 Cach
Block Hit/miss Cache content after access
addres e Set 0 Set 1
s inde
x
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6] Block Cache Set
Adrs
loc
Fully associative k
0 (0 modulo 2)=0
Set-associative caches usually replace the least recently
used b within a set 6 (6 modulo 2)=0
8 (8 modulo 2)=0
Block Hit/miss Cache content after access
addres
s
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0]
BITS Mem[8]
Pilani, Deemed to be University under Section 3 of UGC
What is when the block size is 8
For two way set Associative we have 2 Sets (0,1)
Memory Hit/ Contents of cache blocks after reference
Adr Miss Set0 Set0 Set0 Set0 Set1 Set1 Set1 Set1
0 M 0
8 M 0 8
0 H 0 8
6 M 0 8 6
8 H 0 8 6
there would be no replacements in the two-way set-associative
cache;
and it would have the same number of misses as the fully associative cache.
Check for yourself.: if we had 16 blocks, all 3 caches would have the same
number of misses BITS Pilani, Deemed to be University under Section 3 of UGC
Q2: How Is a Block
Found?
The address can be divided into two main
parts
▪Block offset: selects the data from the
block
offset size = log2(block
size)
▪Block address: tag + index
– index: selects set in cache
index size = log2(#blocks/associativity)
– tag: compared to tag in cache to
determine hit
tag size = addreess size -
index size - offset size
Each block has a valid bit that tells if the block is valid - the block is
in the cache if the tags match and the valid bit is set.
BITS Pilani, Deemed to be University under Section 3 of UGC
Q3: Which Block Should be Replaced on
a Miss?
Easy for Direct Mapped - only on choice
Set Associative or Fully Associative:
▪Random - easier to implement
▪Least Recently used - harder to
implement
Miss rates for caches with different
Associativit 2-way
size, associativity and replacemnt 4-way 8-way
y:
algorithm.
Size LRU Random LRU Rando LRU Random
m
16 KB 5.18 5.69% 4.67% 4.39%5.29%4.96%
%
64 KB 1.88 2.01% 1.54% 1.66% 1.39% 1.53%
%
256caches
For KB 1.15
with low 1.17% 1.13%
miss rates, 1.13%
random 1.12%
is almost 1.12%
as good
as LRU. %
BITS Pilani, Deemed to be University under Section 3 of UGC
Q4: What Happens on a
Write?
Write through: The information is written to both the block in the
cache and to the block in the lower-level memory.
Write back: The information is written only to the block in the cache.
The modified
cache block is written to main memory only when it is replaced.
▪is block clean or dirty? (add a dirty bit to
each block) Pros and Cons of each:
▪Write through
– Read misses cannot result in writes to
memory,
– Easier to implement
– Always combine with write buffers to avoid
memory latency
▪Write back
– Less memory trafficBITS Pilani, Deemed to be University under Section 3 of UGC
Q4: What Happens on a
Write?
Since data does not have to be
brought into the cache on a write
miss, there are two options:
▪Write allocate
– The block is brought into the cache on a write
miss
– Used with write-back caches
– Hope subsequent writes to the block hit in cache
▪No-write allocate
– The block is modified in memory, but not brought
into the cach
– Used with write-through caches
– Writes have to go to memory anyway, so why
bring the
block into the cache
BITS Pilani, Deemed to be University under Section 3 of UGC
To explore
To what cache set in Figure
does the word at address
0x00000014 map? Name
another address that maps to
the same set.
BITS Pilani, Deemed to be University under Section 3 of UGC
Solution
0x00000014 H
Representing only lower byte : 0001 0100
The two least significant bits of the address are 00, because the address is
word aligned.
So the alignment is 101 00
The next three bits are 101, so the word maps to set 5.
The other address are 0001 0100 to xxx1 0100
Or for that matter in Hex xxxxxx14 and in increment of 20H
Words at addresses 0x34, 0x54, 0x74, … , 0xFFFFFFF4 all map to this same
set.
BITS Pilani, Deemed to be University under Section 3 of UGC
Measuring Cache
Performance
CPU time = (CPU execution clock cycles +
Memory stall clock cycles) Clock-
cycle time Memory stall clock cycles =
Read-stall cycles + Write-stall
cycles
Read-stall cycles = Reads/program
Read miss rate
Read miss penalty
Write-stall cycles = (Writes/program
Write miss rate
Write miss
penalty) + Write buffer stalls
(assumes write-through cache)
Write buffer stalls should be negligible and write and
read miss penalties equal (cost to fetch block from memory)
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory stall clock cycles = Mem access/program
EXAMPLE:
Suppose we have a processor with a base CPI of 1.0, assuming all
references hit in the primary cache, and a clock rate of 4 GHz.
Assume a main memory access time of 100 ns, including all the miss
handling. Suppose the miss rate per instruction at the primary
cache is 2%.
How much faster will the processor be if we add a secondary cache
that has a 5 ns access time for either a hit or a miss and is large
enough to reduce the miss rate to main memory to 0.5%?
BITS Pilani, Deemed to be University under Section 3 of UGC
Solution
The miss penalty to main memory 0.25𝑛is = 100𝑛𝑠 = 400
𝑐𝑦𝑐𝑙𝑒𝑠 𝑠
The effective CPI with one level of caching is given
by Total CPI = Base CPI + Memory-stall cycles per
instruction
For the processor with one level of caching,
Total CPI = 1.0 + Memory-stall cycles per
instruction
= 1.0 + 2% × 400 = 9
= 20
5𝑛(or first-level)
With two levels of caching, a miss in the primary
𝑠
0.25𝑛
is = be satisfied either by the secondary𝑠 cache
𝑐𝑦𝑐𝑙𝑒𝑠
penalty for an access to the second-level
cache
cache can or by main
memory. The miss BITS Pilani, Deemed to be University under Section 3 of UGC
Thus, for a two-level cache, total CPI is the sum of the stall
cycles from both levels of cache and the base CPI:
Total CPI = 1 + Primary stalls per instruction + stalls
per
Secondary instruction
= 1 + 2% × 20 + 0.5% × 400
= 1 + 0.4 + 2.0 = 3.4
Thus, the processor with the secondary cache is
faster by 9.0/3.4= 2.6
BITS Pilani, Deemed to be University under Section 3 of UGC
Comparison of different cache mapping, Directly
Mapped cache, set associative and fully associative
First flush the contents of the cache by clicking on the
FLUSH button. Then enter the following instructions after
the last LDB instruction in the above program:
LDB 0, R00
LDB 4, R00
LDB 17, R00
Now select the option
LDB 20, R00 Block Size = 4
Cache Type = Set Associative
Cache Size = 16
Set Blocks = 2-way
Write Policy = Write-Back
BITS Pilani, Deemed to be University under Section 3 of UGC
How much of a reduction in the miss
rate is achieved by associativity?
for a 64 KB data cache with a 16-word
block
Going from one-way(direct) to two-way associativity decreases the
miss rate by about 15%, but there is little further improvement in
going to higher associativity.
BITS Pilani, Deemed to be University under Section 3 of UGC
Effect of associativity on
performance
15
%
1
12 KB
%
2
9 KB
%
4
6 KB
% 8
KB
3 16
% KB
32
64 128
KB
0 KB KB
One- Two-way Four- Eight-
way Associativi way way
ty
BITS Pilani, Deemed to be University under Section 3 of UGC
Increasing associativity requires more comparators and
more tag bits
Assuming per cache
a cache block. a 16-byte block
of 4K blocks,
size, and a
32-bit address, find the total number of sets and the
total number of tag bits for caches that are required for
a direct Assuming a cache of 4K blocks, a 16-
. mapped, block size, and a 32-bit
Byte
b four-way
two-way set address
. associative,
and
c. and fully associative.
•Direct
d mapped cache:
• .16 bytes/block 28 bits for tag and index # sets = # blocks
• Log(4K) = 12 bits for index 16 bits for tag
• Total # of tag bits = 16 bits x 4K locations = 64 Kbits
Two-way set-associative cache:
32 bytes / set
16 bytes/block 28 bits for tag and index
# sets = # blocks / 2 2K sets
Log(2K) = 11 bits for index 17 bits for tag
Total # of tag bits = 17 bits x 2 location / set x 2K sets = 68 Kbits
BITS Pilani, Deemed to be University under Section 3 of UGC
Assuming a cache of 4K blocks, a 16-Byte block
size, and a 32-bit address
Sol:
Four-way set-associative cache:
64 bytes / set
16 bytes/block 28 bits for tag and index
# sets = # blocks / 4 1K sets
Log(1K) = 10 bits for index 18 bits for tag
Total # of tag bits = 18 bits x 4 location /
set x 1K sets = 72 Kbits
Fully associative cache:
1 set of 4 K blocks 28 bits for tag and index
Index = 0 bits tag will have 28 bits
Total # of tag bits = 28 bits x 4K location / set x
1 set = 112 Kbits
BITS Pilani, Deemed to be University under Section 3 of UGC
Multilevel Cache
Considerations
Primary cache
▪Focus on minimal hit time
L-2 cache
▪Focus on low miss rate to avoid main memory
access
▪Hit time has less overall
impact Results
▪L-1 cache usually smaller
than a single cache
▪L-1 block size smaller than
L-2 block size
BITS Pilani, Deemed to be University under Section 3 of UGC
Reducing the Miss Penalty using Multilevel
Caches
To further reduce the gap between fast clock rates of CPUs and the relatively long time
to access memory additional levels of cache are used (level two and level three
caches).
The primary cache is optimized for a fast hit rate, which implies a relatively small
size
A secondary cache is optimized to reduce the miss rate and penalty
needed to go to memory.
Example:
Assume CPI = 1 (with all hits) and 5 GHz clock
100 ns main memory access time
2% miss rate for primary cache
Secondary cache with 5 ns access time and miss rate of .5%
What is the total CPI with and without secondary cache?
BITS Pilani, Deemed to be University under Section 3 of UGC
Reducing the Miss Penalty using
Multilevel Caches
The miss penalty to main
memory: 100 ns / .2 ns per
cycle = 500 cycles
For the processor with only L1
cache:
Total CPI = 1 + 2% x 500 = 11
The miss penalty to access L2
cache: 5 ns / .2 ns per cycle =
25 cycles
If the miss is satisfied by L2
cache, then this is the only miss
penalty.
If the miss has to be resolved by the main memory, then the
total miss penalty is the sum of both
For the processor with both L1 and L2
BITS Pilani, Deemed to be University under Section 3 of UGC
caches: Total CPI = 1 + 2% x 25 + 0.5%
Memory Hierarchy
Framework
Three Cs used to model our memory
hierarchy
▪ Compulsory misses
– Cold-start misses caused by the first access to a block
– Solution is to increase the block size
▪ Capacity misses
– Caused when the cache is full and block needs to be
replaced
– Solution is to enlarge the cache
▪ Conflict misses
– Collision misses caused when multiple blocks compete
for the
same set, in the case of set-associative and fully-
associative
BITS Pilani, Deemed to be University under Section 3 of UGC
Design
Tradeoffs
As in everything in engineering, multiple design tradeoffs
exist when discussing memory hierarchies
There are many more factors involved, but the presented ones are
the most important and accessible ones
Change Effect on miss rate Negative effect
Increase size Decreases capacity misses May increase
access time
Increase Decreases miss rate due to May increase
associativity conflict misses access time
Increase ecreases miss rate for a wide May increase
block size range of block size miss penalty
BITS Pilani, Deemed to be University under Section 3 of UGC
Exampl
e
A computer system contains a main memory of 32K 16-bit
words. It has also a 4Kword cache divided into 4-line sets with
64 words per line. The processor fetches words from locations
0, 1, 2 , … , 4351 in that order sequentially 10 times. The
cache is 10 times faster than the main memory. Assume LRU
policy.
With no cache
Fetch time = (10 passes) (68 blocks/pass) (10T/block) =
6800T
With cache
Fetch time =(68)
(11T) first pass
other
+ (9) (48) (T) + (9) (20) passes
(11T) BITS Pilani, Deemed to be University under Section 3 of UGC
Modern
Systems
BITS Pilani, Deemed to be University under Section 3 of UGC
four-way set-
associative cache
BITS Pilani, Deemed to be University under Section 3 of UGC
Input/Output Module
Necessity
• Wide variety of peripherals
o Delivering different amounts of data (data size)
o At different speeds ,can be slower or faster than the
processor or memory
o In different formats
• All slower than CPU and RAM
• Need I/O modules
• Interface to CPU and Memory
• Interface to one or more peripherals
External Device
An external device connects to the computer by a link to an I/O module
This is called peripheral device or peripheral
Link could be data line. Control line or status line
BITS Pilani, Deemed to be University under Section 3 of UGC
Generic Model of I/O
Module
BITS Pilani, Deemed to be University under Section 3 of UGC
External Devices
classification
• Human readable suitable for user
o Screen, printer, keyboard
• Machine readable suitable for machine
o Monitoring and control
• Communication suitable for remote devices
o Modem
o Network Interface Card (NIC)
External Device Block
Diagram
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O Module
Function
•Control & Timing
•CPU Communication
•Device
Communication
•Data Buffering
•Error Detection
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O module control Steps
•CPU checks I/O module device
status
•I/O module returns status
•If ready, CPU requests data
transfer
through a command to the I/O
module
•I/O module gets data from device
•I/O module transfers data to CPU
BITS Pilani, Deemed to be University under Section 3 of UGC
For all these there has to be communication between the processor and
the I/O module , thus
Processor communication involves
Command decoding
data exchange on the data bus
Status reporting eg. BUSY, READY
Address recognition : I/O module
should recognize each peripheral that it
controls
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O Module
Diagram
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O Module
Decisions
• Hide or reveal device properties to CPU
• Support multiple or single device
• Control device functions or leave for CPU
• I/O channel or I/O processor takes
most of the processing burden
• I/O controller or device controller requires
detailed control from the processor
•Also O/S decisions
o e.g. Unix treats everything it can as a
BITS Pilani, Deemed to be University under Section 3 of UGC
Input Output
Techniques
⚫ Recall that the rate of transfer to and from I/O devices is
slower than the speed of the processor. This creates the need
for mechanisms to synchronize data transfers between them.
⚫ Program-controlled I/O:
⚫ Processor repeatedly monitors a status flag to achieve the
necessary synchronization.
⚫ Processor polls the I/O device.
⚫ Two other mechanisms used for synchronizing data transfers
between the processor and memory:
⚫ Interrupts.
⚫ Direct Memory Access
BITS Pilani, Deemed to be University under Section 3 of UGC
Three Techniques for Input of
a Block of Data
BITS Pilani, Deemed to be University under Section 3 of UGC
Programmed
I/O
• CPU has direct control over I/O
o Sensing status
o Read/write commands
o Transferring data
• CPU waits for I/O module to complete
operation
• Wastes CPU time
• CPU requests I/O operation
• I/O module performs operation
• I/O module sets status bits
• CPU checks status bits periodically
• I/O module does not inform CPU directly
• I/O module does not interrupt CPU
• CPU may wait or come back later
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O
Commands
• CPU issues address
o Identifies module (& device if >1 per module)
• CPU issues command
o Control - telling module what to do
e.g. spin up disk
o Test - check status
e.g. power? Error?
o Read/Write
Module transfers data via buffer from/to device
Addressing I/O Devices
• Under programmed I/O data transfer is very like
memory access
(CPU viewpoint)
• Each device given unique identifier
• CPU commands contain identifier (address)
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O
Mapping
• Memory mapped I/O
oDevices and memory share an address space
oI/O looks just like memory read/write
oNo special commands for I/O
Large selection of memory access commands
available
• Isolated I/O
oSeparate address spaces
oNeed I/O or memory select lines
oSpecial commands for I/O
Limited set
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Mapped and Isolated
I/O
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupt Driven
I/O
• Overcomes CPU waiting
• No repeated CPU checking of device
• I/O module interrupts when ready
Basic Operation
• CPU issues read command
• I/O module gets data from peripheral whilst
CPU does other work
• I/O module interrupts CPU
• CPU requests data
• I/O module transfers data
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrup
ts
In program-controlled I/O, when the processor continuously
monitors the status of the device, it does not perform any
useful tasks.
An alternate approach would be for the I/O device to alert the
processor when it becomes ready.
▪Do so by sending a hardware signal called an interrupt to
the
processor.
▪At least one of the bus control lines, called an interrupt-
request line is dedicated for this purpose.
Processor can perform other useful tasks while it is waiting for
the
device to be ready. BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupt
s
(contd..) Program
1
1
Interrupt Service
routine
2
Interru
pt i
occurs
here i+
1
•Processor is executing the instruction located at address i when an
interrupt occurs.
•Routine executed in response to an interrupt request is called the
interrupt-service routine.
•When an interrupt occurs, control must be transferred to the interrupt
service routine.
•But before transferring control, the current contents of the PC (i+1),
must be saved in a known location.
BITS Pilani, Deemed to be University under Section 3 of UGC
•This will enable the return-from-interrupt instruction to resume
Interrupts (contd..)
⚫ Treatment of an interrupt-service routine is very similar to that of a
subroutine.
⚫ However there are significant differences:
⚫A subroutine performs a task that is required by the calling
program.
⚫Interrupt-service routine may not have anything in
common with the program it interrupts.
⚫Interrupt-service routine and the program that it interrupts
may belong to different users.
⚫As a result, before branching to the interrupt-service routine,
not only the PC, but other information such as condition code
flags, and processor registers used by both the interrupted
program and the interrupt service routine must be stored.
⚫This will enable the interrupted program to resume execution
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupts
(contd..)
⚫ Saving and restoring information can be done automatically by the
processor or explicitly by program instructions.
⚫ Saving and restoring registers involves memory transfers:
⚫Increases the total execution time.
⚫Increases the delay between the time an interrupt request is
received, and the start of execution of the interrupt-service
routine. This delay is called interrupt latency.
⚫ In order to reduce the interrupt latency, most processors save only
the minimal amount of information:
⚫This minimal amount of information includes Program
Counter and processor status registers.
⚫ Any additional information that must be saved, must be
saved explicitly by the
program instructions at the beginning of the interrupt service
BITS Pilani, Deemed to be University under Section 3 of UGC
routine.
Interrupts
(contd..)
When a processor receives an interrupt-request, it must branch to
the interrupt service routine.
It must also inform the device that it has recognized the
interrupt request. This can be accomplished in two ways:
▪Some processors have an explicit interrupt-acknowledge
control signal for this
purpose.
▪In other cases, the data transfer that takes place between the
device and the processor can be used to inform the device.
BITS Pilani, Deemed to be University under Section 3 of UGC
Simple Interrupt
Processing
CPU Viewpoint
•Issue read command
• Do other work
• Check for interrupt at
end of each
instruction cycle
• If interrupted:-
o Save
context
(registers)
o Process
interrupt
Fetch
data
BITS Pilani, Deemed to be University under Section &
3 of UGC
Changes in Memory and
Registers for an Interrupt
BITS Pilani, Deemed to be University under Section 3 of UGC
Design Issues
• How do you identify the module issuing the
interrupt?
• How do you deal with multiple interrupts?
o i.e. an interrupt handler being interrupted
Identifying Interrupting Module (1)
• Different line for each module
o PC
o Limits number of devices
•Software poll
o CPU asks each module in turn eg. TESTI/O
o or reads the status register of each I/O
module
o Slow & time consuming
BITS Pilani, Deemed to be University under Section 3 of UGC
Identifying Interrupting Module
(2)
• Daisy Chain or Hardware poll
o Interrupt Acknowledge sent down a chain
o Module responsible places vector on bus
o CPU uses vector to identify handler routine
• Bus Master
o Module must claim the bus before it can raise
interrupt
o e.g. PCI & SCSI
Daisy-chaining interrupts at
the same level of priority
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple
Interrupts
• Each interrupt line has a priority
• Higher priority lines can interrupt lower priority
lines
• If bus mastering only current master can
interrupt
• Example - PC Bus
• 80x86 has one interrupt line
• 8086 based systems use one 8259A interrupt
controller
• 8259A has 8 interrupt lines
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupts (contd..)
Consider a system employing interrupt-driven I/O for a particular
device that transfers data at an average of 8 KB/s on a continuous
basis. a. Assume that interrupt processing takes about 100 µs (i.e.,
the time to jump to the interrupt service routine (ISR), execute it,
and return to the main program).
Determine what fraction of processor time is consumed by this I/O
device if it interrupts for every byte.
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupts (contd..)
Now assume that the device has two 16-byte buffers and interrupts
the processor when one of the buffers is full. Naturally, interrupt
processing takes longer, because the ISR must transfer 16 bytes.
While executing the ISR, the processor takes about 8µ s for the
transfer of each byte. Determine what fraction of processor time is
consumed by this I/O device in this case
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupts (contd..)
Now assume that the processor is equipped with a block transfer I/O
instruction such as that found on the Z8000. This permits the associated
ISR to transfer each byte of a block in only 2 µs. Determine what fraction
of processor time is consumed by this I/O device in this
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupts
(contd..)
A particular system is controlled by an operator through commands
entered from a keyboard. The average number of commands entered
in an 8-hour interval is 60.
a. Suppose the processor scans the keyboard every 100 ms. How many
times will
the keyboard be checked in an 8-hour period?
b.By what fraction would the number of processor visits to the keyboard
be reduced if interrupt-driven I/O were used?
BITS Pilani, Deemed to be University under Section 3 of UGC
Sequence of Events
• 82C59A accepts interrupts
• 82C59A determines priority
• 82C59A signals 8086 (raises INTR line)
• CPU Acknowledges
• 82C59A puts correct vector on data bus
• CPU processes interrupt
ISA Bus Interrupt System
• ISA bus chains two 82C59As together
• Link is via interrupt 2
• Gives 15 lines
o 16 lines less one for link
• IRQ 9 is used to re-route anything trying to use
IRQ 2
• Incorporated
o Backwardsincompatibility
chip BITS Pilani, Deemed to be University under Section 3 of UGC
82C59A Interrupt
Controller
BITS Pilani, Deemed to be University under Section 3 of UGC
Intel 82C55A
Programmable Peripheral Interface
BITS Pilani, Deemed to be University under Section 3 of UGC
The Intel 8255A
Control Word
(b) Bit definitions of the 8255 control
register to modify single bits of port C
(a) Mode denition of the 8255 control register to
configure the 8255
BITS Pilani, Deemed to be University under Section 3 of UGC
Keyboard/Display Interface to
8255A
BITS Pilani, Deemed to be University under Section 3 of UGC
Direct Memory
Access
• Interrupt driven and programmed I/O require
active CPU intervention
o Transfer rate is limited
o CPU is tied up
• DMA is the answer
DMA Function
• Additional Module (hardware) on bus
• DMA controller takes over from CPU for I/O
BITS Pilani, Deemed to be University under Section 3 of UGC
DM
A
BITS Pilani, Deemed to be University under Section 3 of UGC
Typical DMA Module
Diagram DMA
•
Operation
CPU tells DMA controller:-
o Read/Write
o Device address
o Starting address of
memory block for data
o Amount of data to
be transferred
• CPU carries on with
other work
• DMA controller deals
with transfer
• DMA controller sends
interrupt when finished
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA Transfer Cycle
Stealing
• DMA controller takes over bus for a cycle
• Transfer of one word of data
• Not an interrupt
o CPU does not switch context
• CPU suspended just before it accesses bus
o i.e. before an operand or data fetch or a data
write
• Slows down CPU but not as much as CPU doing
transfer
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA and Interrupt Breakpoints
During an Instruction Cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA
(contd..)
A DMA module is transferring characters to memory using cycle stealing,
from a device transmitting at 9600 bps. The processor is fetching
instructions at the rate of 1 million instructions per second (1 MIPS). By
how much will the processor be slowed down due to the DMA activity?
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA Configurations
(1)
•Single Bus, Detached DMA
controller
•Each transfer uses bus twice
oI/O to DMA then DMA to memory
•CPU is suspended twice
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA Configurations
(2)
• Single Bus, Integrated DMA
controller
• Controller may support >1
device
• Each transfer uses bus once
o DMA to memory
• CPU is suspended once
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA Configurations
(3)
• Separate I/O Bus
• Bus supports all DMA enabled
devices
• Each transfer uses bus once
o DMA to memory
• CPU is suspended once
BITS Pilani, Deemed to be University under Section 3 of UGC
Intel 8237A DMA
Controller
• Interfaces to 80x86 family and DRAM
• When DMA module needs buses it sends HOLDRequest (HRQ) signal to
processor
• CPU responds HLDA (hold acknowledge)
o DMA module can use buses
• E.g. transfer data from memory to disk
1. Device requests service of DMA by pulling DREQ (DMA
request) high 2.DMA puts high on HRQ (hold request),
3.CPU finishes present bus cycle (not necessarily present
instruction) and puts
high on HDLA (hold acknowledge). HOLD remains active for
duration of DMA
4.DMA activates DACK (DMA acknowledge), telling device to start
transfer 5.DMA starts transfer by putting address of first byte on
address bus and
8237 DMA
Usage of
Systems Bus
BITS Pilani, Deemed to be University under Section 3 of UGC
Fly-
By(8237)
• While DMA using buses processor idle
• Processor using bus, DMA idle
o Known as fly-by DMA controller
• Data does not pass through and is not stored in
DMA chip
o DMA only between I/O port and memory
o Not between two I/O ports or two memory
locations
• Can do memory to memory via register
• 8237 contains four DMA channels
o Programmed independently
o Any one active
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O Channels and its evolution
• I/O devices getting more sophisticated
• e.g. 3D graphics cards
• CPU instructs I/O controller to do transfer
• I/O controller does entire transfer
• I/O module is enhanced to become a processor in its
own right
• I/O module has memory of its own
• Improves speed
o Takes load off CPU
o Dedicated processor is faster
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O Channel
e
Architectur
A selector channel controls multiple high-
speed devices and, at any one time, is
dedicated to the transfer of data with one of
those devices. Thus, the I/O channel selects
one device and effects the data transfer
A multiplexor channel can handle I/O
with multiple devices at the same time
Thus, the I/O channel serves in place of the
CPU in
controlling these I/O controllers
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
A given processor requires 1000 cycles to perform a context switch and start an
interrupt handler (and the same number of cycles to switch back to the program
that was running when the interrupt occurred), or 500 cycles to poll an I/O
device.An I/O device attached to that processor makes 150 requests per second,
each of which takes 10,000 cycles to resolve once the handler has been started.
By default , the processor polls every 0.5 ms if it is not using interrupts
How many cycles per second does the processor spend handling I/O from
the device if interrupts are used?
Sol: The device makes 150 requests, each of which require one interrupt.
Each interrupt
takes 12,000 cycles (1000 to start the handler, 10,000 for the handler,1000
to switch
back to the original program), for a total of 1,800,000 cycles spent handling
BITS Pilani, Deemed to be University under Section 3 of UGC
Example contnd.
b. How many cycles per second are spent on I/O if polling is used (include all polling
attempts)? Assume the processor only polls during time slices when user programs
are not running, so do not include any context-switch time in your calculation
Sol: The processor polls every 0.5 ms, or 2000 times/s. Each polling attempt takes
500 cycles, so it spends 1,000,000 cycles/s polling. In 150 of the polling attempts,
a request is waiting from the I/O device, each of which takes 10,000 cycles to
complete for another 1,500,000 cycles.
Therefore, the total time spent on I/O each second is 2,500,000 cycles with polling.
C. How often would the processor have to poll for polling to take as many cycles
per second as
interrupts
Sol: In the polling case, the 150 polling attempts that find a request waiting
consume 1,500,000
cycles. Therefore, for polling to match the interrupt case, an additional 300,000
BITS Pilani, Deemed to be University under Section 3 of UGC
Example 2.
An I/O device transfers 10 MB/s of data into the memory of a processor over the I/O
bus, which has
a total data transfer capacity of 100 MB/s. The 10 MB/s of data is transferred as 2500
independent
pages of 4 KB each. If the processor operates at 200 MHz, it takes 1000 cycles to
initiate a DMA transaction, and 1500 cycles to respond to the device's interrupt when
the DMA transfer completes, what fraction of the processor's time is spent handling
the data transfer with and without DMA?
Sol:Without DMA, the processor must copy the data into the memory as the I/O device
sends it over the bus. Because the device sends 10 MB/s over the I/O bus, which has a
capacity of 100 MB/s, 10% of each second is spent transferring data
over the bus.
Assuming the processor is busy handling data during the time that each page is being
transferred over the bus (which is a reasonable assumption because the time
between transfers is too short to be worth doing a context
BITS Pilani, Deemed toswitch), then
be University 10%
under of3the
Section of UGC
Example 2. Sol
With DMA, the processor is free to work on other tasks, except when
initiating each DMA and responding to the DMA interrupt at the end of
each transfer. This takes 2500 cycles/transfer, or a total of 6,250,000
cycles spent handling DMA each second.
Because the processor operates at 200 MHz, this means that 3.125% of
each second, or 3.125% of the processor's time, is spent handling DMA,
less than 1/3 of
the overhead without DMA.
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
A microprocessor provides an instruction capable of moving a string of
bytes from one area of memory to another. The fetching and initial
decoding of the instruction takes 10 clock cycles. Thereafter, it takes 15
clock cycles to transfer each byte. The microprocessor is clocked at a
rate of 10 GHz.
a. Determine the length of the instruction cycle for the case of a string
of 64 bytes.
b.What is the worst-case delay for acknowledging an interrupt if the
instruction is non-interruptible?
c.Repeat part (b) assuming the instruction can be interrupted at the
beginning of each byte transfer.
BITS Pilani, Deemed to be University under Section 3 of UGC
Interesting information
the 1st drive with a capacity 1 GByte ,
produced
by IBM in 1980
weighed 250 kg
cost $40,000
size of a kitchen appliance
2 BITS Pilani, Deemed to be University under Section 3,
Magnetic
disk
A disk is a circular platter constructed of
nonmagnetic material, called the substrate,
coated with a magnetizable material.
Traditionally, the substrate has been an
aluminum or aluminum alloy material. More
recently, glass substrates have been introduced.
The glass substrate has a number of benefits
Improvement in the uniformity of the magnetic film surface
to increase
disk reliability
A significant reduction in overall surface defects to help
reduce read-write errors
Better stiffness to reduce disk dynamics
Greater ability to withstand shock and damage
BITS Pilani, Deemed to be University under Section 3 of UGC
Disk Data Layout
BITS Pilani, Deemed to be University under Section 3 of UGC
Organization of Data on a
Disk
Sector 0,
Sector 2,
track 1
trackn Sector 0,
track 0
The set of corresponding tracks on all surfaces
of a stack of disk form a cylinder
Organization of one surface of a The data are accessed by specifying the
disk. surface number, the track and the sector
number
BITS Pilani, Deemed to be University under Section 3 of UGC
Disk Performance Parameter
Seek time: is the time required to move the disk arm to the required track.
Rotational delay: Disks, other than floppy disks, rotate at speeds ranging from
3600 rpm (for handheld devices such as digital cameras) up to, as of this writing,
20,000 rpm;
Transfer time : The transfer time to or from the disk depends on the rotation
speed of the disk T = b /rN
T = transfer time ;b = number of bytes to be transferred; N = number of bytes
on a track ; r = rotation speed, in revolutions per second
Ttotal = Ts + 1/2r + b/rN
BITS Pilani, Deemed to be University under Section 3 of UGC
A timing
comparison
Consider a disk with an advertised average seek time of 4 ms, rotation speed of 15,000
rpm, and 512-byte sectors with 500 sectors per track. Suppose that we wish to read a file
consisting of 2500 sectors for a total of 1.28 Mbytes
First, let us assume that the file is stored as compactly as possible on the disk. That is, the
file occupies all of the sectors on 5 adjacent tracks (5 tracks * 500 sectors/track = 2500
sectors). This is known as sequential organization
Average seek 4 ms; Average rotational delay 2 ms
Read 500 sectors= b/rN= (512X500 X 60)/(15000 X500X512)=4 ms ; Thus Total=10 ms
Suppose that the remaining tracks can now be read with essentially no seek time.
That is, the I/O operation can keep up with the flow from the disk. Then, at most, we
need to deal with rotational delay for the four remaining tracks. Thus each successive
track is read in 2 + 4 = 6 ms.
To read the entire file, Total time = 10 + (4 * 6) = 34 ms = 0.034 seconds
BITS Pilani, Deemed to be University under Section 3 of UGC
A timing
comparison
Consider a disk with an advertised average seek time of 4 ms, rotation speed of 15,000
rpm, and 512-byte sectors with 500 sectors per track. Suppose that we wish to read a file
consisting of 2500 sectors for a total of 1.28 Mbytes
Now let us calculate the time required to read the same data using random access rather
than sequential access;
that is, accesses to the sectors are distributed randomly over the disk.
For each sector, we have
Average seek 4 ms
Rotational delay 2 ms
Read 1 sector =
b/rN= (512X
60)/(15000
X500X512)= 0.008 ms
Which is
6.008 ms
Total time = 2500 * BITS Pilani, Deemed to be University under Section 3 of UGC
Access Time
Seek Time: 5 to 8 ms( Time required to move the
read/write head to proper track
Rotational delay or latency time(time to reach the
correct sector over the track:
On average This is the time for half a rotation of disk
Access Time is the sum of these delays
Disk capacity( of a 3.5 inch diameter )
20 Data recording surface
15000 tracks / surface
400 sectors per track
20 X 15000 X 400 X 512 ≅ 60 × 109= 60
each sector contains 512 bytes of data
GBytes
Access time
=
6
0
10000×
Typical Average seek time 6 ms
= 3𝑚𝑠
rotate at 10,000 rpm so latency
2
BITS Pilani, Deemed to be University under Section 3 of UGC
Some basic
concepts
• Maximum size of the Main Memory is determined
by the addressing scheme
• byte-addressable(big endian or…
Little endian)
• CPU-Main Memory Connection
Processor Memory
k-
address
bit
MAR bus
n-bit
data
bus Up to 2k
MDR addressable
locations
Word length = n
Control lines bits
( R / W , MFC,
etc.)
BITS Pilani, Deemed to be University under Section 3 of UGC
Big Endian or Little
Endian
16 bit address capable
of 2^16= 64k Mempry
locations
What abt with 32 bit
instructions?
2^32= 4 G memory
locations
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory timing definitions
Memory Access time: The time between
initiation of a task and its completion for example
time elapsed between Read and MFC (memory
function complete)
Memory Cycle time: the minimum time delay
between two successive memory operations , eg:
the time between two successive write operations
Memory cycle time is usually larger than
the Memory access time
BITS Pilani, Deemed to be University under Section 3 of UGC
Internal organization of memory
chips 16 X 8 orgaization
BITS Pilani, Deemed to be University under Section 3 of UGC
Internal organization of memory
chips
• Each memory cell can hold one bit of information.
• Memory cells are organized in the form of an array.
• One row is one memory word.
• All cells of a row are connected to a common line,
known as the “word line”.
• Word line is connected to the address decoder.
• Sense/write circuits are connected to the data
input/output
lines of the memory chip.
• The memory circuit stores 128 bits
• Requires 14 external connections for address data
and
control lines
• & another two for Vdd and Grnd
BITS Pilani, Deemed to be University under Section 3 of UGC
Address is 404 for a 1K‐word
memory.
Comparison wrt to
Decoder requirement
How many AND(NAND)
gates required for 10 to 1k
decoder?
How many AND(NAND)
gates
required for 5 to 32k
decoder?
BITS Pilani, Deemed to be University under Section 3 of UGC
Address multiplexing for a 64K
DRAM
After a time
equivalent to the
settling time of the
row selection, RAS
goes back to the 1
level.
Registers are
used to store the
addresses of the
row and column.
CAS must go
back to the 1 level
before initialing
another memory
operation.
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory
organisation
Figure indicates how to construct a module of chips that can store 1 MB
based on a group of four 256-Kbyte chips. Let’s say this module of chips is
packaged as a single 1-MB chip, where the word size is 1 byte. Give a
high-level chip diagram of how to construct an 8-MB computer memory
using eight 1-MB chips. Be sure to show the address lines in your diagram
and what the address lines are used for.
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory
organisation
BITS Pilani, Deemed to be University under Section 3 of UGC
SRAM Ce
l
• Two transistor inverters are cross connected to implement a
basic flip-flop.
• The cell is connected to one word line and two bits lines by
transistors T1 and T2
• When word line is at ground level, the transistors are turned off
and the
latch retains its state
• Read operation: In order to read state of SRAM cell, the word
line is activated to close switches T1 and T2. Sense/Write
b
circuits at the bottom monitor the state ofbb and b’
T2
T X Y
1
Word line
Bit lines
BITS Pilani, Deemed to be University under Section 3 of UGC
DRAM
S
The address line is activated when
the bit value from this cell is to
be read or written
The transistor acts as a switch
In a read operation the sense
amplifier connected to bit line
decides whether the voltage is above
some threshold
Then it drives the bit line to the
full voltage representing 1
for zero the bit line is grounded
thus discharging the capacitor
completely
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous DRAMs
• Static RAMs (SRAMs):
– Consist of circuits that are capable of retaining their
state as long as the power is applied.
– Volatile memories, because their contents are lost when
power is interrupted.
– Access time of static RAMs are in the range of few
nanoseconds.
– However, the cost is usually high.
• Dynamic RAMs (DRAMs):
– Do not retain their state indefinitely. So called dynamic
– Contents must be periodically refreshed.
– Contents may be refreshed while accessing them for
reading.
BITS Pilani, Deemed to be University under Section 3 of UGC
DRAM
operation
Consider a dynamic RAM that must be given a refresh cycle 64
times per ms. Each refresh operation requires 150 ns; a memory
cycle requires 250 ns.
What percentage of the memory’s total operating time must be
given to refreshes?
BITS Pilani, Deemed to be University under Section 3 of UGC
DRAM
timing
Figure shows a simplified timing diagram for a DRAM read operation over a
bus. The access time is considered to last from t1 to t2. Then there is a
recharge time, lasting from t2 to t3, during which the DRAM chips will have
to recharge before the processor can access them again.
a. Assume that the access time is 60 ns and the recharge time is 40 ns.
What is the memory cycle time? What is the maximum data rate this
DRAM can sustain, assuming a 1-bit output?
b. b. Constructing a 32-bit wide memory system using these chips
yields what data transfer rate?
BITS Pilani, Deemed to be University under Section 3 of UGC
DRAM
timing
BITS Pilani, Deemed to be University under Section 3 of UGC
Internal organization of 2M X 8
dynamic memory chip
Arranged in 4k X 4k
• Each row can store 512
RAS
bytes. 12 bits to select a
row, and 9 bits to select a
group in a row. Total of 21
Row Row 4096 bits.
addres
decode 512 8
s latch
r cell array • First apply the row
address, RAS signal
latches the row address.
Then apply the column
Sense / CS address, CAS signal
A20- 9 A8 -
Write
0 circuits R/ W latches the address.
• Timing of the memory unit
is controlled by a
Colum
n Colum specialized unit which
n
addres decode generates RAS and CAS.
s latch r
• This is asynchronous
CAS D7 D0 DRAM
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
DRAMS
• Operation is
directly synchronized with
processor clock signal.
• The outputs of the
sense circuits are connected
to a latch.
• During a Read
operation, the contents
of the cells in a row are
loaded onto the latches.
• During a refresh
operation, the contents of
the cells are refreshed
without changing the
contents of the latches.
• Data held in the
latches correspond to the
selected columns are
transferred to the output.
• For a burst mode of
operation, successive columns
are under
BITS Pilani, Deemed to be University selected
Section 3using
of UGC column
Static memories implementation : example
Start here 12/04/2024
Implement a memory unit of 2M words of 32
bits each.
Use 512kx8 static memory chips.
Each column consists of 4 chips.
Each chip implements one byte data.
A chip is selected by setting its chip select
control line to 1.
Selected chip places its data on
the data output line, outputs of other
chips are in high impedance state.
21 bits to address a 32-bit word.
High order 2 bits are needed to select
the row, by activating the four Chip Select
signals.
19 bits are used to access specific byte
BITS Pilani, Deemed to be University under Section 3 of UGC
Static memories implementation of 2M X 32 bits Implemented as
512k X 8 X 4
BITS Pilani, Deemed to be University under Section 3 of UGC
Latency, Bandwidth, and DDRSDRAMs
Memory latency is the time it takes to transfer a word of data to or
from memory
Memory bandwidth is the number of bits or bytes that can
be transferred in one second.
DDRSDRAMs: on both edges
Cell array is organized in two banks
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory
controller
Row/
Address address
Column
RAS
R/ W
Memor CAS
Request y R/ W
Processor controll
CS Memory
er
Clock
Clock
Dat
a
BITS Pilani, Deemed to be University under Section 3 of UGC
Refreshing overhead
When used with DRAM chips which do not have
a self refreshing capability Memory controller has
to provide all the information needed for
refershing
Consider a memory of 8k= 8192 Rows
4 cycles to refresh each row then 8192 X 4
= 32768 cycles to refresh all rows
In typical SDRAMs a typical refresh period is
64 ms
at a clock rate of 133Mhz
time needed to refresh all rows =
32768/133MHz = 0.246ms
overhead= 0.246/64= 0.0038 = 0.4% total
time available for accessing memory
BITS Pilani, Deemed to be University under Section 3 of UGC
Interleaving
◾ Divides the memory system into a
number of memory modules. Each module has
its own address buffer register (ABR) and data buffer register (DBR).
◾ Arranges addressing so that successive
words in the address space are
placed in different modules.
◾ When requests for memory access
involve consecutive addresses, the
access will be to different modules.
◾ Since parallel access to these
modules is possible, the average rate
of fetching words from the Main
Memory can be increased.
BITS Pilani, Deemed to be University under Section 3 of UGC
Methods of address
layouts
k m
bits bits
Module Address in MM
module address
ABR DBR ABR DBR ABR DBR
Modul Module Module
e0 i n- 1
◾ Consecutive words are placed in a module.
◾ High-order k bits of a memory address determine
the module.
◾ Low-order m bits of a memory address determine
the
word within a module.
◾ When a block of words is transferred from main memory
to cache, only one module is busy at a time.
BITS Pilani, Deemed to be University under Section 3 of UGC
Methods of address layouts(Interleaving)
m bits k
bits
Address in Module MM
module address
•Consecutive words
located in are
modules. consecutive
ABR DBR ABR DBR ABR DBR
•Consecutive addresses
Modul
e0
Module
i
Modul
e 2k -
be
canlocated in consecutive
1 modules.
• While transferring a
block of data, several
memory modules can be
kept busy at the same time.
BITS Pilani, Deemed to be University under Section 3 of UGC
Exampl
e
9 0000
8 0001
00 01 10
7 0010
00 9 00 8 00 7
6 0011
01 5 01 4 01 3
5 0100
10 1 10 15 10 14
4 0101
11 12 11 11 11 10
3 0110
2 0111
1 1000
15 1001 11
14 1010 00 6
13 1011 01 2
12 1100 10 13
11 1101 11 0
10 1110
0 1111
BITS Pilani, Deemed to be University under Section 3 of UGC
Example: (Definition of
cylinders)
A disk has 24 recording surface has 14000 cylinders. There is an
average of 400 sectors per track. Each sector contains 512 Bytes
of data
What is the maximum number of bytes that can be stored?
The maximum number of bytes that can be stored on this disk is
24 X14000 X400X 512 = 68.8 X 109 bytes
What is the data transfer rate in bytes per second at a rotational
speed of 7200rpm?
The data transfer rate is (400 X 512 X 7200)/60 = 24.58 X 106
bytes/s.
BITS Pilani, Deemed to be University under Section 3 of UGC
The number of cylinders of a disk drive exactly equals the number
of tracks on a single surface in the drive. It comprises the same track
number on each platter, spanning all such tracks across each platter
surface that is able to store data
Example: Contnd
A disk has 24 recording surface has 14000
tracks. There is an
average of 400 sectors per track. Each sector
contains 512 Bytes
of data
Using a 32 bit word , suggest a suitable addressing scheme for
the disk, assuming that there are 512 bytes / Sector
Need 9 bits to identify a sector, 14 bits for a track, and 5 bits for a
surface.
Thus, a possible scheme is to use address bits A8-0 for sector,
A22-9 for track,
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
What is the average time to read or write a 512-byte
sector for a typical disk rotating at 15,000 RPM? The
advertised average seek time is 4 ms, the transfer
rate is 100 MB/sec, and the controller overhead is 0.2
ms. Assume that the disk is idle so that there is no
waiting time.
Sol:
Average disk access time = average seek time + average rotational
Delay + transfer time + controller overhead
rotational latency Also
seek The process of called rotational delay.
positioning a The time required for the
read/write head over desired sector of a disk to
the proper track on a rotate under the read/write
disk & seek time is the head;
time consumed in it usually assumed to be half
the rotation time.
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the average time to read or write a 512-byte
sector for a typical disk rotating at 15,000 RPM? The
advertised average seek time is 4 ms, the transfer
rate is 100 MB/sec, and the controller overhead is 0.2
ms. Assume that the disk is idle so that there is no
waiting time.
Sol:
Average disk access time = average seek time + average rotational
Delay + transfer time + controller overhead
=4.0 ms + 0.5 rotation /15,000 RPM + 0.5 KB/100 MB/sec + 0.2 ms
= 4.0 + 2.0 + 0.005 + 0.2 = 6.2 ms
If the measured average seek time is 25% of the advertised average
time, then it is
1.0 ms + 2.0 ms + 0.005 ms + 0.2 ms = 3.2 ms
BITS Pilani, Deemed to be University under Section 3 of UGC
Exampl
e:
Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64 X 1 bit. Give the array
configuration of the chips on the memory board showing all required input and output signals for
assigning this memory to the lowest address space. The design should allow for both byte and 16-bit
word accesses.
Sol:
Number of total addressable locations of 16 bits= 8192/16= 512
no of 64 X 1 chips required 8192/64= 128 chips
There will be 128/16=8 row of 16 bits
512 can be rranged into 8 rows X 64 columns
For 8 rows no of decoder bits 3
For 64 columns no of decoder bits 6
BITS Pilani, Deemed to be University under Section 3 of UGC
Solution
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
•Consider a dynamic RAM that must be given a refresh
cycle 64 times per ms. Each refresh operation requires
150 ns; a memory cycle requires 250 ns.What percenta
of the memory’s total operating time must be given to
refreshes?
BITS Pilani, Deemed to be University under Section 3 of UGC
Example
•Consider a dynamic RAM that must be given a refresh
cycle 64 times per ms. Each refresh operation requires
150 ns; a memory cycle requires 250 ns.What percenta
of the memory’s total operating time must be given to
refreshes?
•Sol:
9600ns/1ms X 100= 0.96 % ≅ 1%
• Total Time consumed in 1 ms = 64 X 150 ns= 9600 ns
%=
out of 1ms
150𝑛𝑠 ×100
refresh operation consumes250
60% 𝑛𝑠
•Out of total time of memory cycle of 250 ns
BITS Pilani, Deemed to be University under Section 3 of UGC
Figure shows a simplified timing diagram for a DRAM read
operation over a bus.
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the memory cycle time?
What is the maximum data rate this DRAM can
sustain, assuming a 1-bit output?
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?
Sol: memory cycle time = (60+ 40)=100 ns
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?
Sol: memory cycle time = (60+ 40)=100 ns
1 bit in 100ns so 1/100ns per/ sec= 10Mbps
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?
Sol: memory cycle time = (60+ 40)=100 ns
1 bit in 100ns so 1/100ns per/ sec= 10Mbps
Constructing a 32-bit wide memory system using these chips yields
what data transfer rate?
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?
Sol: memory cycle time = (60+ 40)=100 ns
1 bit in 100ns so 1/100ns per/ sec= 10Mbps
Constructing a 32-bit wide memory system using these chips yields
what data transfer rate?
Ans 320Mbps= 320Mbps/8 = 40MB/s
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple level
cache
Average Access Time
Given a 100 MHz machine with a with a miss penalty of 20 cycles, a
hit time of 2 cycles, and a miss rate of 5%, calculate the average
memory access time (AMAT).
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple level cache
Given a 100 MHz machine with a with a miss penalty of 20 cycles, a
hit time of 2 cycles, and a miss rate of 5%, calculate the average
memory access time (AMAT).
AMAT = hit_time + miss_rate x miss_penalty
Since the clock rate is 100 MHz, the cycle time is:
1/(100 MHz) = 10 ns
which gives
AMAT = 10 ns x (2 + 20 x 0.05) = 30 ns
Note: Here we needed to multiply by the cycle time
because the hit_time and miss_penalty were given in
cycles. BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple level
cache
Suppose doubling the size of the cache decrease the miss rate
to 3%, but causes the hit time to increases to 3 cycles and the
miss penalty to increase to 21 cycles. What is the AMAT of the
new machine?
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple level
cache
Suppose doubling the size of the cache decrease
the miss rate to 3%, but causes the hit time to
increases to 3 cycles and the miss penalty to
increase to 21 cycles. What is the AMAT of the
new machine?
For the new machine:
AMAT = 10 ns x (3 + 21 x 0.03) = 36.3 ns
BITS Pilani, Deemed to be University under Section 3 of UGC
Mass
Storage
• Many systems today need to store many
terabytes of data
• Don’t want to use single, large disk
– too expensive
– failures could be dangerous
• Would prefer to use many smaller disks
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID
• Redundant Array of Independent Disks
• Basic idea is to connect multiple disks
together to provide
– large storage capacity
– faster access to reading data
– redundant data
• Many different levels of RAID systems
– differing levels of redundancy, error checking,
capacity, and cost
BITS Pilani, Deemed to be University under Section 3 of UGC
Striping
• Take file data and map it to different disks
• Allows for reading data in parallel
file block 0 block 1 block 2 block 3
data
Disk Disk Disk Disk
0 1 2 3
BITS Pilani, Deemed to be University under Section 3 of UGC
MIRRORING
file block 0 block 1 block 2 block 3 block 4
data
1 block 1 block
0 0
sector 2 block 2 block
sector
s 1 1
s
3 block 3 block
2 2
4 block 4 block
3 3
5 block 5 block
Disk Disk
4 41
0
5 5
BITS Pilani, Deemed to be University under Section 3 of UGC
Mirrorin
g
• Keep two copies of data on two separate disks
• Gives good error recovery
– if some data is lost, get it from the other source
• Expensive
– requires twice as many disks
• Write performance can be slow
– have to write data to two different spots
• Read performance is enhanced
– can read data from file in parallel
BITS Pilani, Deemed to be University under Section 3 of UGC
Parit
y
• Way to do error checking and correction
• Add up all the bits that are 1
– if even number, set parity bit to 0
– if odd number, set parity bit to 1
• To actually implement this, do an exclusive OR of
all the bits being considered
• Consider the following 2 bytes
byte parity
10110011 1
01101010 0
• If a single bit is bad, it is possible to correct it
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID
Level-0
• Often called striping
• Break a file into blocks of data
• Stripe the blocks across disks in the system
• Simple to implement
– disk = file block % number of disks
– sector = file block / number of disks
• provides no redundancy or error detection
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level-0
file block 0 block 1 block 2 block 3 block 4
data
1 block 1 block
0 1
sector 2 block 2 block
sector
s 2 3
s
3 block 2
4 3
3 4
4 5
5
Disk Disk
0 1
BITS Pilani, Deemed to be University under Section 3 of UGC
Data Mapping For RAID 0
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level-1
• A complete file is stored on a single disk
• A second disk contains an exact copy of the file
• Provides complete redundancy of data
• Read performance can be improved
– file data can be read in parallel
• Write performance suffers
– must write the data out twice
• Most expensive RAID implementation
– requires twice as much storage space
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level-1
file block 0 block 1 block 2 block 3 block 4
data
1 block 1 block
0 0
sector 2 block 2 block
sector
s 1 1
s
3 block 3 block
2 2
4 block 4 block
3 3
5 block 5 block
Disk Disk
4 41
0
5 5
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 2
Uses Bit-level striping with Hamming codes for ECC
Number of disks required depends on exact implementation
Only fair fault tolerance
Advantages
Random Read performance=fair
Sequential Read Performance=very good
Sequential Write performance=good
Disadvantages
Random Write Performance=poor
Requires a complex controller
High overhead for check disks
Not used in modern systems
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 2
RAID Level 3
Uses byte-level striping with dedicated parity
Requires minimum three drives to implement
Has good fault-tolerance
Advantages
Random Read Performance=good
Sequential Read performance=very good
Sequential Write performance=fair to good
Lowest overhead for check disks
Disadvantages
Random Write performance=poor
Complex controller design
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 3
RAID Level 4
Uses Block-level striping with dedicated parity
Requires minimum of 3 drives to implement
Has good fault-tolerance
Advantages
Random Read Performance=very good
Sequential Read and Write performance=good
Lowest overhead of check disks
Disadvantages
Quite complex controller design
Random write performance=poor
Not commonly used
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 4
RAID Level 5
Uses Block-level striping with distributed parity
Requires a minimum of 3 drives to implement
Advantages
Random Read performance=very good
Random Write performance=fair
Sequential Read and Write performance=good
Lowest overhead of check disks
Disadvantages
Most complex controller design
Difficult to rebuild in the event of a disk failure
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 5
RAID 6
• Two parity calculations
• Stored in separate blocks on different disks
• User requirement of N disks needs N+2
• High data availability
– Three disks need to fail for data loss
– Significant write penalty
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 6
Raid Level 6 uses Block-level striping with dual
distributed parity
RAID Level 1+0
Uses Mirroring and striping without parity
RAID Level 0+1
Uses Mirroring and striping without Parity
CONTROL UNIT OPERATION
The execution of an instruction involves the execution of a
sequence of substeps, generally called cycles
The Fetch Cycle
The Indirect Cycle
The Interrupt Cycle
The Execute Cycle
The Instruction Cycle
control unit of a processor performs two tasks:
step through a series of micro-operations in the proper
sequence
generates the control signals
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
CONTROL UNIT OPERATION
The control signals generated by the control unit cause the
opening and closing of logic gates, resulting in the transfer of
data to and from registers and the operation of the ALU
technique for implementing a control unit
hardwired implementation
Microprogrammed control
Micro-operations are a sequence of more
fundamental operations which constitute the operation
cycle.
A single micro-operation generally involves a transfer
between registers,
a transfer between a register and an external bus, or a
simple ALU operation
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-Operations
A computer executes a
program Fetch/execute
cycle
Each cycle has a number of steps : Called
micro- operations
Each step does very little: Atomic operation of
CPU micro-operations needed to perform the
subcycles of the instruction cycle.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Constituent
Elements of Program
Execution
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
The Fetch
Cycle
Memory Address Register (MAR)
▪Connected to address bus
▪Specifies address for read or write op
Memory Buffer Register (MBR)
▪Connected to data bus
▪Holds data to write or last data read
Program Counter (PC)
▪Holds address of next instruction to be
fetched
Instruction Register (IR)
▪Holds last instruction fetched
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Executing an Instruction
Fetch the contents of the memory location
pointed to by the PC. The contents of this
location are loaded into the IR (fetch
phase).
IR ← [[PC]]
Assuming that the memory is byte
addressable, increment the contents of
the PC by 4 (Assume 4 Byte instruction).
PC ← [PC] + 4
Carry out the actions specified by the
instruction in BITS
Act, 1956 IR (execution phase).
thePilani, Deemed to be University under Section 3 of UGC
Processor
Organizati
on
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Internal
organization of
the processor
o ALU
o Registers for temporary storage
o Various digital circuits for executing
different micro operations.(gates,
MUX,decoders,counters).
o Internal path for movement of data
between ALU and registers.
o Driver circuits for transmitting signals to
external units.
o Receiver BITS
circuits for
Pilani, Deemed
Act, 1956
incoming
to be University signals
under Section 3 of UGC
PC:
Keeps track of execution of a program
Contains the memory address of the next instruction to be
fetched and executed.
MAR:
Holds the address of the location to be accessed.
I/P of MAR is connected to Internal bus and an O/p to
external bus. MDR:
Contains data to be written into or read out of the
addressed location.
IT has 2 inputs and 2 Outputs.
Data can be loaded into MDR either from memory bus or
from internal processor bus.
The data and address lines are connected to the internal bus via
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Register
s:
The processor registers R0 to Rn-1 vary considerably from one
processor to another.
Registers are provided for general purpose used by
programmer.
Special purpose registers-index & stack registers.
Registers Y,Z &TEMP are temporary registers :used by
processor during the execution of some instruction.
Multiplexer:
Select either the output of the register Y or a constant value 4 to be
provided as input A of the ALU.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
ALU:
Used to perform arithmetic and logical operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Register
Transfer
The input and output
s gates for register Ri are
controlled by signals in
Rin and Riout .
Rin Is set to1 – data
available on common
bus are loaded into Ri.
Riout Is set to1 – the
contents of register are
placed on the bus.
Riout Is set to 0 – the
bus can be used for
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Data transfer between two registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting R1out=1. This places the
contents of R1 on the processor bus.
2. Enable input of register R4 by setting R4in=1. This loads the
data from the processor bus into register R4.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Fetch
Sequence
(symbolic)
t1
:
MAR <- (PC)
MBR <-
t2 (memory)
:
PC <- (PC) +n
t3
: IR <- (MBR)
(tx = time unit/clock
cycle) or
t1 MAR <- (PC)
:
MBR <-
t2
(memory) PC
:
<- (PC) +n
t3
Assume
: IRn<-
byte
(MBR)
instruction
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Rules for Clock Cycle Grouping
Proper sequence must be followed
▪MAR <- (PC) must precede MBR <-
(memory) Conflicts must be avoided
▪Must not read & write same register at
same time
▪MBR <- (memory) & IR <- (MBR) must not be
in same cycle
Also: PC <- (PC) +n involves addition
▪Use ALU
▪May need additional micro-operations
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Sequence of Events, Fetch
Cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Interrupt
Cycle
t1 MBR <-(PC)
:
MAR <- save-
t2 address
:
PC <- routine-
t3
address
:
This is a minimum
memory <- (MBR)
▪May be additional micro-ops to get addresses
▪N.B. saving context is done by interrupt handler routine,
not micro- ops
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
The Execute Cycle
Different for each instruction
e.g. ADD R1,X - add the contents of location X to
Register 1 , result in R1
t1: MAR <- (IRaddress)
t2: MBR <-
(memory) t3: R1 <-
R1 + (MBR)
Note no overlap of
micro-operations
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Performing an Arithmetic or Logic
Operation The ALU is a combinational
R3=
R1+R2 circuit that has no internal
storage.
ALU gets the two
operands from MUX and
bus. The result is
temporarily stored in
register Z.
What is the sequence of
operations to add the
contents of register R1 to
those of R2 and store the
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956 result in R ?
Performing an Arithmetic or Logic
Operation Step 1: Output of the register
R3=
R1+R2 R1 and input of the register Y
are enabled, causing the
contents of R1 to be
transferred to Y.
Step 2: The multiplexer’s select
signal is set to select Y causing the
multiplexer to gate the contents of
register Y to input A of the ALU. At
the same time contents of R2 are
gated into the bus and hence to
input B
Step 3: The contents of Z are
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Register
Transfers
Input and output gating for one register
bit.
Rin clk Q
0 Q
Io Q
1 B
B
When Rout=1 B=Q.
otherwise in high
impedance state
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Fetching a Word from
Memory
. Connection and control signals for register MDR
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Instruction
Cycle
Each phase decomposed into sequence of elementary micro-
operations
E.g. fetch, indirect, and interrupt cycles Execute cycle
▪One sequence of micro-operations for each opcode
Assume new 2-bit register called ICC
▪Instruction cycle code (ICC) designates which part of cycle
processor is in
– 00: Fetch
– 01: Indirect
– 10: Execute
– 11: Interrupt
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Flowchart for
Instruction
Cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Requirements for CONTROL OF THE
PROCESSOR
the following three-step process leads to a characterization of
the control unit
1. Define the basic elements of the processor.
2. Describe the micro-operations that the processor performs.
3. Determine the functions that the control unit must
perform to cause the micro-operations to be performed
The control unit performs two basic tasks:
•Sequencing: The control unit causes the processor to step
through a series of micro-operations in the proper sequence,
based on the program being executed.
•Execution: The control unit causes each micro-
operation to be performed
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Control Unit Clock
Block Diagram of the Control ▪One micro-instruction (or set of
Unit
parallel micro-instructions) per
clock cycle
Instruction register
▪Op-code for current instruction
▪Determines which micro-
instructions are performed
Flags
▪State of CPU
▪Results of previous operations
From control bus
▪Interrupts
▪Acknowledgements
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Example Control Signal Sequence - Fetch
MAR <- (PC)
▪Control unit activates signal to open gates between PC
and MAR
MBR <- (memory)
▪Open gates between MAR and address bus
▪Memory read control signal
▪Open gates between data bus and MBR
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Data Paths and Control Signals
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Control Unit with Decoded
Inputs
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Hardwired control
Signal
Let us consider a single control signal, C5. This signal causes data to be
read from the external data bus into the MBR
PQ = 11 Instruction C; PQ = 10 EC ;PQ = 01 Indirect Cycle; PQ =
00 FC
Then the following Boolean expression defines C5
This expression is not complete. C5 is also needed during the
execute cycle. For our simple example, let us assume that there
are only three instructions that read from memory: LDA, ADD, and
AND. Now we can define C5
This same process could be repeated for every control signal
generated by the processor
BITS Pilani, Deemed to be University under Section 3 of UGC
Hardwired
Implementation (1)
Control unit inputs
Flags and control bus
–Each bit means
something Instruction
register
– Op-code causes different control signals for each
different instruction
– Unique logic for each op-code
– Decoder takes encoded input and produces single
output
– n binary inputs and 2n outputs
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Hardwired
Implementation (2)
Clock
– Repetitive sequence of pulses
– Useful for measuring duration of micro-ops
– Must be long enough to allow signal propagation
– Different control signals at different times within
instruction cycle
– Need a counter with different control signals for t1, t2
etc.
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Detailed Block
Description
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Generating
Zin
Control sequence for an unconditional branch
instruction
Step Action
Add (R3),
1 PCout , MAR in , Read, Select4,Add, R1
2 Z in
3 Zout , PCin , Yin , WMF
4 C MDRout , IRin
5 Offset-field-of-IRout,
Add, Zin
Zout, PCin , End
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Generating
Zin
Branc Ad
h d
T4
T6
T1
Zin = T1 + T6 • ADD + T4 • BR
+…
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Problems With Hard Wired
Designs
Complex sequencing & micro-operation
logic
Difficult to design and test
Inflexible design
Difficult to add new instructions
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Generating
End
End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) •
BRN + …
Control Step counter to its starting value
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Microprogrammed Control
Control signals are generated by a program similar to machine language
programs. Control Word (CW); Microroutine; individual
microinstruction
Add R1,
(R3)
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-programmed Control Implementation
Use sequences of instructions to control complex
operations Called micro-programming or firmware
All the control unit does is generate a set of control
signals
Each control signal is on or off
Represent each control signal by a bit
Have a control word for each micro-operation
Have a sequence of control words for each machine
code
instruction
Add an address to specify the next micro-
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-programmed Control Implementation
Today’s large microprocessor
▪Many instructions and associated register-level
hardware
▪Many control points to be
manipulated This results in
control memory that
▪Contains a large number of
words
– corresponding to the number of
instructions to be executed
▪Has a wide word width
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-
program
Word Length Based on 3 factors
▪Maximum number of simultaneous micro-operations
supported
▪The way control information is represented or encoded
▪The way in which the next micro-instruction address is
specified
Types
Each micro-instruction specifies single (or few) micro-
operations to be performed
▪ (vertical micro-programming)
Each micro-instruction specifies many different micro-
operations to be performed in parallel
▪(horizontal micro-programming)
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-
program
Vertical Micro-programming
Width is narrow
n control signals encoded into log2 n bits
Limited ability to express parallelism
Considerable encoding of control information
requires external memory word decoder to identify
the exact control line being manipulated
Horizontal Micro-programming
Wide memory word
High degree of parallel operations possible
Little encoding of control information
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Typical Microinstruction
Formats
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-program
Organization ofControl
Memory
Compromise
Divide control signals into
disjoint groups
Implement each group
as separate field in
memory word
Supports reasonable
levels of parallelism
without too much
complexity
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execution of a Complete Instruction (self study)
Add R1, (R3)
Fetch the instruction
Fetch the first operand (the contents of the
memory location pointed to by R3)
Perform the addition
Load the result into R1
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execution of a Complete
Instruction Add R1,
(R3)
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Data Flow,
Fetch Cycle Add R1,
(R3)
WMFC: Wait for memory function check to accommodate the
variability in response time, The addressed device sets this
signal to 1 to indicate that contents of the specified location has
been read BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Example
Write the sequence of control steps needed to add the immediate
number to R1 as per the bus structure shown
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Example
1. PCout, MARin, Read, Select4, Add,
Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. PCout, MARin, Read, Select4, Add,
Zin
5. Zout, PCin, Yin
6. R1out, Yin, WMFC
7. MDRout, SelectY, Add, Zin
8. Zout, R1in, End
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execute Cycle (ISZ)
•ISZ X - increment and skip if zero
▪t•The content
MAR of) location X is incremented by 1. If the re
<- (IRaddress
1:
is 0, the next <-
MBR
▪t
(memory) MBR
•instruction
2: is skipped. A possible sequence of micro-
<- (MBR) +is
▪toperations 1
▪3: memory <-
▪t
Notes: (MBR)
▪4:
if is a single if (MBR)
micro-
operation == 0
then PC
▪Micro-operations done
<- (PC)
during t4 + 1 BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execute Cycle (BSA)
BSA X - Branch and save address
▪Address of instruction following BSA is
saved in X
▪
▪Execution continues from X+1
t MAR <- (IRaddress)
▪1: MBR <- (PC)
▪t PC <- (IRaddress)
▪2: memory <-
▪t (MBR) PC <- (PC) + 1
3:
The address in the PC at the start of the instruction is the address of the next
instruction in sequence. This is saved at the address designated in the IR.
The latter address is also incremented to provide the address of the
instruction for the next instruction cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Multiple-Bus
Organization
•Allow the contents of two
different registers to be
accessed simultaneously and
have their contents placed on
buses A and B.
•Allow the data on bus C to
be loaded into a third
register during the same
clock cycle.
• Incrementer unit.
•ALU simply passes one of
its two
input operands unmodified
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
to bus C
Add R6, R4, R5
Step Action
1 PCout, R=B, MAR in , Read,
2 IncPC
WMFC
3 MDR outB , R=B, IR
4 in SelectA, Add, R6in ,
R4outA,
End
R5outB,
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
What is Superscalar?
Common instructions (arithmetic, load/store, conditional
branch) can be initiated and executed independently
Equally applicable to RISC & CISC
In practice usually RISC
In this example, two integer, two floating-point, and one memory (either
load or store) operations can be executing at the same time.
BITS Pilani, Deemed to be University under Section 3 of UGC
Superscalar v Superpipelined
FI DI EI WBSimple 4-stage pipeline
Superpipelined
Many pipeline stages
need less than half a
clock cycle
Double internal
Degree 2
clock speed gets two
tasks per
external clock cycle
Superscalar allows
parallel fetch execute
BITS Pilani, Deemed to be University under Section 3 of UGC
Superscalar approach
Instruction level
parallelism Compiler
based optimisation
Hardware techniques
Limited by
▪True data dependency
▪Procedural
dependency
▪Resource conflicts
▪Output dependency
▪Antidependency BITS Pilani, Deemed to be University under Section 3 of UGC
Effect of Dependencies
True Data Dependency
ADD r1, r2 (r1 :=
r1+r2;)
MOVE r3,r1 (r3 := r1;) Degree 2
Can fetch and decode second
instruction in parallel with first
Can NOT execute second
instruction until first is
finished
Procedural
Can not execute instructions
Dependency
after a branch in parallel with
instructions before a branch
Also, if instruction length is not
fixed, instructions have to be
decoded to find out how many
fetches are needed
BITS Pilani, Deemed to be University under Section 3 of UGC
Effect of
Dependencies
Resource Conflict
Two or more instructions Degree 2
requiring access to the same
resource at
the same time
e.g. two arithmetic
instructions Can duplicate
resources
e.g. have two arithmetic
units
BITS Pilani, Deemed to be University under Section 3 of UGC
Dependency due to memory Cycle
MOV EAX, ; load register EAX with the
eff
contents of effective memory
MOV EBX, address eff
EAX
;load EBX with the contents of
For RISC processor
two or more cyclesEAX
when the load is a cache hit
tens or even hundreds of cycles for a cache miss
sol: for the compiler to reorder instructions
This scheme is less effective in the case of a superscalar pipeline:
The independent instructions executed during the load are likely to
be executed
on the first cycle of the load, leaving the processor with nothing to do
until the
load completes
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Design Issues:Instruction-Level Parallelism and Machine Pa
Instruction level parallelism
Instructions in a sequence are
independent
Execution can be overlapped
Governed by data and procedural
Example
dependency
Independent
Load R1 ← R2 Showing
Dependency Add
Add R3 ← R3, R4 ← R3, R2 Add
“1” Add R3 ← R4 ← R4, R2
Store [R4] ← R0
R3, “1”
Machine Parallelism
Ability to take advantage of instruction level
parallelism
Governed by number of parallel pipelines
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Issue Policy
Instruction issue policy to refer to the protocol
used to issue instructions
Three types of orderings are important
Order in which instructions are fetched
Order in which instructions are executed
Order in which instructions change
registers and memory
the processor must accommodate the
various
dependencies and conflicts.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
superscalar instruction issue policies
In-order issue with in-order completion
In-order issue with out-of-order
completion
Out-of-order issue with out-of-order
completion
In-Order Issue In-Order Completion
Issue instructions in the order they
occur
Not very efficient
May fetch >1 instruction
Instructions must stall if necessary
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
In-Order Issue In-Order Completion
(Diagram)
I1 requires two cycles two execute
I3 & I4 conflict for the same
functional unit
I5 depends on the value produced
by I4
I5 and I6 conflict for a functional unit
Superscalar Instruction Issue and Completion
Policies BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
In-Order Issue Out-of-Order
Completion (Diagram)
Superscalar Instruction Issue and Completion
Policies
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
In-Order Issue Out-of-Order
Completion
Output dependency
▪ I1: R3 ← R3 op R5
▪I2: R4 ← R3 + 1
▪ I3: R3 ← R5 + 1
▪I4: R7 ← R3 op R4
▪ I2 depends on result of I1 – True data dependency: Instruction
I2 cannot execute before instruction I1
▪If I3 completes before I1, the result from I1 will be wrong -
output (read-write)
dependency
▪I4 must wait for I3, because it uses a result pro duced by I3.
▪ I3 must complete after I1 to produce the correct output values
in I4
▪Eg. Of a Write after write (WAW) dependency.
▪ Here more difficult to deal with instruction interrupts and exceptions
as at the time of interruption, instructions
BITS Pilani, Deemed ahead
to be University of the
under Section instruction
3 of UGC
Act, 1956
Out-of-Order Issue Out-of-Order
Completion
Decouple decode pipeline from execution pipeline
Can continue to fetch and decode until this pipeline is full
When a functional unit becomes available an
instruction can be executed
Since instructions have been decoded, processor can
look ahead
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Antidependency
Read after-write(RAW) dependency
I1: R3 ← R3 op
R5 I2: R4 ← R3
+ 1 I3: R3 ← R5
+ 1 I4: R7 ← R3
op R4
Instruction I3 cannot complete execution before
instruction I2 begins execution and has fetched its
operands. This is so because I3 updates register R3,
which is a source operand for I2.
The term antidependency is used because the
constraint is similar to that of a true data
BITS Pilani, Deemed to be University under Section 3 of UGC
dependency, but reversed: Act, 1956
Reduced Instruction Set
Computer
Key features
Large number of general purpose registers
or use of compiler technology to optimize register use
Limited and simple instruction set
Emphasis on optimising the instruction pipeline
All operations on data apply to data in registers
The only operations that affect memory are
load and store operations
The instruction formats are few in number, with
all instructions typically being one size.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Basic Pipelining: A "Typical" RISC IS
32-bit fixed format instruction (3 formats)
ALU instructions, Load/Store, Branches and Jumps
32 32-bit GPR (R0 contains zero, DP take pair)
3-address, reg-reg arithmetic instruction
Single address mode for load/store: base +
displacement
• – no indirection
Simple branch conditions /Jump
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
A simple implementation of a RISC Instruct
Set Every instruction can be implemented in at most 5 cycles.
▪Instruction fetch cycle (IF): send PC to memory and fetch the current
instruction; Update the PC to the next PC by adding 4.
▪Instruction decode/register fetch cycle (ID): decode the instruction and
read the registers. For a branch, do the equality test on the
registers, and compute the target address
▪Execution/effective address cycle (EX): The ALU operates on the
operands prepared in the prior cycle, performing:
– Memory reference: ALU adds the base register and the offset.
LW 100(R1) => 100 + R1
– Register-Register ALU instruction: ALU performs the operation
specified by the opcode on the values read from the register file.
ADD R1,R2,R3 => R1 + R2
– Register-Immediate ALU instruction: ALU performs the operation
specified by the opcode on the first values read from the register file
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
A simple implementation of a RISC
Instruction Set
Every instruction can be implemented in at most 5 cycles.
▪Memory access (MEM). Read the memory using the
effective address computed in the previous
cycle if the instruction is a load; If it is a store, write
the data to the memory
▪Write-back cycle (WB). For Register-Register ALU
instruction or
load instruction, write the result into the register file,
whether it comes from the memory (for a load) or from
the ALU (for an ALU instruction)
Branch instructions require 2 cycles, store instructions
require 4 cycles, others require 5 cycles.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
An example of an instruction
execution
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
U
AL
P
C
Reg File
ory
Mem
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
U
AL
P
C 000001000010001000011
…
00001100011111110000
Reg File
0…
y
Memor
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
U
AL
P
C 000001000010001000011
…
00001100011111110000
Reg File
0…
y
Memor
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
P
C 000001000010001000011
…
00001100011111110000
Reg File
0…
y
Memor
Fetch the
instruction
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
000001000010001000011
P …
C 00001100011111110000
Reg File
0…
Fetch the
y
Memor
instruction Move PC
to the next
instruction
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
0
R
x
0
y
g e
Re Fil
R
z
1
ory
Mem
R
R3
2 …
1
R
3 BITS Pilani, Deemed to beAct,
University under Section 3 of UGC
1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
ADD R 0
0 x
R y
g e
Re Fil
decode the
1 z instruction
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
ADD R 0
R1
0 x
R y
g e
Re Fil
decode the
1 z instruction
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg
File
ADD R1 R 0
R2
0 x
R y
g e
Re Fil
decode the
1 z instruction
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
IR
000001000010001000011
U
AL
…
Reg File
ADD R1 R2 R3 R0 0
R x
1 y
g e
Re Fil
R z
ory
Mem
2
R
R3 …
13
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011
U
AL
…
y
Reg
File
ADD R1 R2 R3 0
R0
x
R
y
g e
Re Fil
1
z
R
ory
Mem
2
R3
R …
1
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execution (EX)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011
U
AL
…
y
Reg File
ory
Mem
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execution (EX)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+
U
AL
… y
y
Execute the
operation
Reg File
ory
Mem
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Writeback (WB)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+
U
AL
… y
y
Reg
File
R 0
0 x
R y Write the result to the
g e
Re Fil
register
1 z
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Writeback (WB)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+
U
AL
… y
y
Reg
File
R 0
0 x
R y Write the result to the
g e
Re Fil
register
1 x+y
ory
Mem
R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956