0% found this document useful (0 votes)

26 views570 pages

Computer Architecture Insights

Uploaded by

kishoremarikani21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views570 pages

Computer Architecture Insights

Uploaded by

kishoremarikani21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 570

Computer

Organization &
BITS
Architecture DS
Rao
Pilani 98663582
Pilani Campus 71
Computer Architecture and
Organization-1
• Architecture is those attributes visible to the programmer and having direct
impact on the logical execution of the program
– Instruction set, Number of bits used for data types, I/O mechanisms,
Memory addressing techniques
– e.g. x86 architecture, IBM/360 architecture

• Organization is implementation of computer system in terms of its

interconnection of functional units
– Control signals
– Interfaces between computer and peripherals
– Memory technology
2
BITS Pilani, Pilani
Computer
Architecture
and Organization-2

• Architecture Question?
– Is there a multiply/division instruction available?

• Organization Question?
– Is multiplication implemented by separate
hardware or is it done by repeated addition?

3
BITS Pilani, Pilani
Computer Organization vs. Computer
Architecture

• For example, it is an architectural design issue whether a computer will have a multiply
instruction.

• It is an organizational issue whether that instruction will be implemented by a special

multiply unit or by a mechanism that makes repeated use of the add unit of the system

• Many computer manufacturers offer a family of computer models, all with the same
architecture but with differences in organization.

• Different models in the family have different price and performance characteristics.
• A prominent example of both these phenomena is the IBM system /370 architecture. It is
first introduced in 1970 and included a number of models.
• This gives code compatibility

— At least backwards
• Organization differs between different versions
BITS Pilani, Deemed to be University under Section 3 of UGC
Structure and Functions of various components of
a computer
system.
• Structure is the way in which components relate to each other.
 Structure = static relations among components
• Function is the operation of individual components as part of the structure.
 Function = dynamic behaviour of each component
• In terms of description, we have two choices, starting at the bottom and
building up to a complete description or beginning with a top view and
decomposing the system into the subparts.

• The computer system will be described from top down.

BITS Pilani, Deemed to be University under Section 3 of UGC

How to Describe
Computer System?
• Computer is a complex system!!!
– Contains millions of electronics components

• Function is the operation of individual

components as part of the structure

• Structure is the way in which components are

related to each other
6
BITS Pilani, Pilani
Functional View of
Computer

• All computer functions are:

— Data processing

— Data storage

— Data movement

— Control

7
BITS Pilani, Pilani
Computer
Operations
Operations (b) Storage Operation (c) Processing from/to storage
Operations (a) Data movement

Operation (d)
Processing from storage to I/O

8
BITS Pilani, Pilani
Structu
re

• The figure shows the simplest depiction of a computer.

• The computer interacts in some fashion with its external environment.
• In general, all of its linkages to the external environment can be classified as peripheral
devices or communication lines.

Peripherals
Communication lines

COMPUTER
•Storage
•processing

The Computer
BITS Pilani, Deemed to be University under Section 3 of UGC
Structural View of a
Computer

Peripherals Comput
er
Central Main
Processin Memor
g Unit y
Computer
Systems
Interconnecti
on

Input
Outp
Communication ut
lines

10
BITS Pilani, Pilani
• There are four main structural components:
• Central processing unit (CPU): controls the operation of the
computer and performs its data processing functions; often simply
referred to as processor.

• Main memory: Stores data.

• I/O: Moves data between the computer and its external
environment.

• System Interconnection: Some mechanism that provides for

communication among CPU, main memory and I/O.

BITS Pilani, Deemed to be University under Section 3 of UGC

Structural View of
the CPU

CPU

Comput Arithmet
er Register ic and
I/O s Logic
System CPU Unit
Bus
Internal CPU
Memor Interconnecti
y on

Contr
ol
Unit

12
BITS Pilani, Pilani
Structural View of
the CPU

•A common example of system interconnection is by means of a system bus, consisting

of number of conducting wires to which all the other components attach.

•The most complex component is the CPU and its major structural components are as

follows:

•Control unit: Controls the operation of the CPU and hence the computer.

•Arithmetic and logic unit (ALU): Performs the computers data processing functions.

•Registers: Provides storage internal to the CPU.

•CPU Interconnection: Some mechanism that provides for communication among the

control unit, ALU and registers.

BITS Pilani, Pilani

Brief History of
Computers
• ENIAC was the first general purpose electronic digital
computer
– By John Mauchly and John Eckert in 1946
• Von Neumann Machine (1946) called as IAS( Institute
of advanced study)
– Based on stored program concept
• PDP-1 Computer (1957)
– Developed by Digital Equipment Corporation (DEC)
– First step towards mini computers
• IBM SYSTEM/360 (1964)
– First planned family of computers
14
BITS Pilani, Pilani
ENIAC -
background
• Electronic Numerical Integrator And Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
– Too late for war effort
• Used until 1955
15
BITS Pilani, Pilani
ENIAC -
details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
16
BITS Pilani, Pilani
von
Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from
memory and executing
• Input and output equipment operated by control
unit
• Princeton Institute for Advanced Studies
– IAS ( Institute for Advanced Study)
• Completed 1952
17
BITS Pilani, Pilani
Structure of von Neumann
machine

18
BITS Pilani, Pilani
IAS -
details
• 1000 x 40 bit words
– Binary number

– memory of the IAS consists of 4,096 storage locations, called words, of 40

binary digits (bits) each
– 2 x 20 bit instructions

• Set of registers (storage in CPU)

– Memory Buffer Register

– Memory Address Register

– Instruction Register

– Instruction Buffer Register

– Program Counter

– Accumulator
19
– Multiplier Quotient
BITS Pilani, Pilani
IAS -
details

20
BITS Pilani, Pilani
Structure of
IAS – detail

21
BITS Pilani, Pilani
Commercial
Computers

• 1947 - Eckert-Mauchly Computer Corporation

• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census 1950 calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
– Faster
– More memory
 IBM
• Punched-card processing equipment
• 1953 - the 701
– IBM’s first stored program computer
– Scientific calculations
• 1955 - the 702
– Business applications
• Lead to 700/7000 series
22
BITS Pilani, Pilani
Transisto
rs
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al
23
BITS Pilani, Pilani
Transistor Based
Computers
• Second generation machines
• NCR & RCA produced small transistor
machines
• IBM 7000
• DEC - 1957
– Produced PDP-1(Programmed Data
Processor)

24
BITS Pilani, Pilani
Microelectron
ics
• Literally - “small electronics”
• A computer is made up of gates, memory cells
and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer

25
BITS Pilani, Pilani
DEC
PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
– $100k+ for IBM 360
• Embedded applications & OEM(original equipment
manufacturers)
• BUS STRUCTURE

BITS Pilani, Pilani

DEC - PDP-8 Bus
Structure

models of the PDP-8 used a structure that became virtually universal for
microcomputers: the bus structure

BITS Pilani, Pilani

IBM 360
series
• 1964
• Replaced (& not compatible with) 7000 series
• First planned “family” of computers
– Similar or identical instruction sets
– Similar or identical O/S
– Increasing speed
– Increasing number of I/O ports (i.e. more terminals)
– Increased memory size
– Increased cost
• Multiplexed switch structure

BITS Pilani, Pilani

Semiconductor
Memory
• 1970
• Fairchild
• Size of a single core
– i.e. 1 bit of magnetic core storage
• Holds 256 bits
• Non-destructive read
• Much faster than core
• Capacity approximately doubles each year

BITS Pilani, Pilani

Int
el
• 1971 - 4004
– First microprocessor
– All CPU components on a single chip
– 4 bit
• Followed in 1972 by 8008
– 8 bit
– Both designed for specific applications
• 1974 - 8080
– Intel’s first general purpose microprocessor

BITS Pilani, Pilani

Computer
Generations
• Vacuum tube (1946-1957) & Transistor (1958-
1964)
• Integrated Circuits
• Small scale integration - 1965 on
– Up to 100 devices on a chip
• Medium scale integration - to 1971
– 100 - 3,000 devices on a chip
• Large scale integration - 1971-1977
– 3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
– 100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 onwards
– Over 100,000,000 devices on a chip
31
BITS Pilani, Pilani
x86
Evolution-1
• 1971 - 4004
– First microprocessor of 4 bit
– All CPU components on a single chip
– Followed in 1972 by 8008 (8 bit processor)
– Both designed for specific applications
• 8080
– First general purpose microprocessor
– Process/move 8 bit data at a time
– Used in first personal computer – Altair

32
BITS Pilani, Pilani
x86
Evolution-2
• 8086
– Much more powerful (16 bit data)
– Instruction cache, pre-fetch few instructions
– 8088 (8 bit external bus) used in first IBM PC
• 80286
– 16 MByte memory addressable
– Up from 1MB (in 8086)
• 80386
– 32 bit processor with multitasking support

33
BITS Pilani, Pilani
x86
Evolution-3
• 80486
– Sophisticated powerful cache and instruction pipelining
– Built in maths co-processor
• Pentium
– Superscalar
– Multiple instructions executed in parallel
• Pentium Pro
– Increased superscalar organization
– Aggressive register renaming
– Branch prediction and Data flow analysis
34
BITS Pilani, Pilani
x86
Evolution-4
• Pentium II
– MMX technology, graphics, video & audio processing
• Pentium III
– Additional floating point instructions for 3D graphics
• Pentium 4
– Further floating point and multimedia enhancements
• Itanium Series
– 64 bit with Hardware enhancements to increase speed
• What’s next???
– Multi core architectures
35
BITS Pilani, Pilani
BITS Pilani
Pilani|Dubai|Goa|Hyderabad

Fetch and Execute cycles

BITS Pilani, Pilani

What is a
program?
• A logical group of instructions to achieve
a specific task.
• An instruction is a sequence of steps
• For each step an arithmetic or
logical operation or
data movement is done.

BITS Pilani, Pilani

Addition of Two
Numbers
Start
Load R1,A
Load R2,B
R3=R1+R2
Store
C,R3
End of
program
38
I P ila n i,
Components (start
here)
• The Control Unit and the Arithmetic and Logic
Unit constitute the Central Processing Unit
• Data and instructions need to get into the
system and results out
– Input/output
• Temporary storage of code and results is
needed
– Main memory

BITS Pilani, Pilani

Computer
Components: Top
Level View

BITS Pilani, Pilani

Instruction
Cycle
• Two steps:
– Fetch
– Execute

BITS Pilani, Pilani

Fetch
Cycle
• Program Counter (PC) holds address of next
instruction to be fetched
• Processor fetches instruction from memory
location pointed to by PC
• Increment PC
– Unless told otherwise
• Instruction loaded into Instruction Register
(IR)
• Processor interprets instruction and performs
required actions
BITS Pilani, Pilani
Execute
Cycle
• Processor-memory
– data transfer between CPU and main memory
• Processor I/O
– Data transfer between CPU and I/O module
• Data processing
– Some arithmetic or logical operation on data
• Control
– Alteration of sequence of operations
– e.g. jump
• Combination of above

BITS Pilani, Pilani

0 34 15

Opcode Address

Instruction format
• The instruction format provides 4 bits for the opcode, so that there can be as many as 24 = 16
different opcodes.
• ,and up to 212 = 4096 (4K) words of memory can be directly addressed.

opcode function
0001 Load AC from memory
0010 Store AC to Memory
0101 Add to AC from
Memory

BITS Pilani, Pilani

Example of Program
Execution

The program fragment shown

adds the contents of the
memory word at address 940
to the contents of the
memory word at address 941
and stores the result in the
latter location

BITS Pilani, Pilani

The Role of
Performance

46
BITS Pilani, Deemed to be University under Section 3 of UGC
47
BITS Pilani, Deemed to be University under Section 3 of UGC
CPU Performance
• Equation
Multiple aspects to performance: helps to isolate them
• Latency = seconds / program =

– (insns / program) * (cycles / insn) * (seconds / cycle)

– Insns / program: dynamic insn count

• Impacted by program, compiler, ISA

– Cycles / insn: CPI

• Impacted by program, compiler, ISA, micro-arch

– Seconds / cycle: clock period (Hz)

• Impacted by micro-arch, technology

• For low latency (better performance) minimize all three

– Difficult: often pull against one another

48
Example we have seen: RISC vs.er Section 3 of UGC Act, 1956

–
PERFORMANCE
Performance is one of the key parameter to
evaluate a system.
If we say one system is better than another
we consider
• Response Time
• Throughput

BITS Pilani, Deemed to be University under Section 3 of UGC

Computer
Performance
• Performance is one of the key parameter to-
– Evaluate processor hardware
– Measure requirement for new systems
• When we say one computer has better
performance than another, what does it mean?
– Criterion for the performance?
• Response Time (Single computer), Throughput
(Data center)

50
BITS Pilani, Pilani
Clock
Rate
• Operation performed by processor are governed by
system clock (fundamental level of processor speed
measurement)
– Generated by quartz crystal

one clock period

• A clock cycle is the basic unit of time to execute one

operation.
• Clock Rate (clock cycles per second in MHz or GHz) is
inverse of clock cycle time (clock period) 51
BITS Pilani, Pilani
Is Clock Rate
Enough?

• As we know, instruction execution takes several discrete steps

– Fetching, decoding, ALU operation, fetching data etc.
– It takes multiple clock cycles for its execution
• Different instructions takes different number of cycles for their
execution
– LOAD, ADD, SUB, JUMP etc.
• Thus clock speed doesn’t tell the whole story!!!

52
BITS Pilani, Pilani
Performance: Application
Specific

• Performance
– How a processor performs when executing a given application

– Application performance depends upon

• Speed of the processor
• Instruction set
• Choice of language
• Efficiency of the compiler
• Programming skill of the user

53
BITS Pilani, Pilani
CPU
Performance
• To maximize performance, need to minimize
execution time

performance = 1 / execution_time

If X is n times faster than Y, then

performanceX execution_timeY
= =n
performanceY execution_timeX

54
BITS Pilani, Pilani
Cycles Per Instruction
(CPI)
• For any given processor, number of cycles
required varies for different types of
instructions
– e.g. load, store, branch, add, mul etc.
• Hence CPI is not a constant value for a
processor
• Needs to calculate average CPI for
processor

56
BITS Pilani, Pilani
CPU Performance and it’s
Factors

• Program execution time (T) = Ic x CPI x 𝑟

– 𝑟 =cycle time, Ic = no. of instructions in the program

• Instruction execution also requires memory access

– T = Ic x [p +(m x k] x 𝑟

– p=processor cycles, m = memory references

– k = ratio of memory cycles to processor cycles

• These performance factors are influenced by

– ISA,

– compiler,(how effective the compiler is in producing an efficient machine language program

from a high-level language program)
– processor implementation

– and memory hierarchy 57

BITS Pilani, Pilani
MIPS
Rate
• Common measure of performance
– Millions Instructions Per Second (MIPS) rate
– Ic/(Execution time x 106)
– This can be written as:
=Ic/ cycle time x (Ic x CPI )x 106
= f/ CPI x 106

57
BITS Pilani, Pilani
In class
Example1
• Consider the execution of a program that results in the execution of 2 million
instructions on a 400-MHz processor. The program consists of four major types of
instructions. The instruction mix and the CPI for each instruction type are given
below, based on the result of a program trace experiment

The average CPI when the program is executed on a uniprocessor with the above trace results is
CPI = 0.6 + (2 * 0.18) + (4 * 0.12) + (8 * 0.1) = 2.24.
MIPS rate = Ic /T * 106 = f /CPI * 106
The corresponding MIPS rate is (400 * 106 )/(2.24 * 106 ) ≈ 178.
58
BITS Pilani, Pilani
Example of Performance
Measure

59
BITS Pilani, Pilani
Limitation of MIPS
Rate
• MIPS rate or instruction execution rate is also
inadequate to measure CPU performance. Why?
– Because of differences in ISA
– Ex. To execute a high level language statement
A=B+C (A,B and C are in memory) may need
different number of low level instructions for
different ISA

60
BITS Pilani, Pilani
Amdahl’s Law
Formula
• For program running on single processor

— Fraction f of code infinitely parallelizable with no scheduling

overhead
— Fraction (1-f) of code inherently serial

— T is total execution time for program on single processor

— N is number of processors that fully exploit parralle portions of code

• Conclusions

– f small, parallel processors has little effect

– N ->∞, speedup bound by 1/(1 – f)

• Diminishing returns for using more
61
processors BITS Pilani, Pilani
Amdahl’s Law

1 How much will an optimization

improve performance?
P
(1 P)  S P = proportion of running time
affected by optimization
S = speedup
What if I speedup 25% of a program’s
execution by 2x? 1.14x speedup

What if I speedup 25% of a program’s

execution by ∞? 1.33x speedup

BITS Pilani, Pilani Ca

In class
Example2
Suppose we have made the following measurements:
Frequency of FP operations = 50%
Average CPI of FP operations = 4.0 Average CPI
of other instructions = 1.33 Frequency
of FPSQR= 20%
CPI of FPSQR = 20
Assume that the two design alternatives are to
decrease the CPI of FPSQR to 2 or to
decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives
using the processor performance equation.

63
BITS Pilani, Pilani
Apply Amdahl's Law to the above question and observe the result

Amdahl’s Law eqn:

With FPSQR Speedupoverall

With FP Speedupoverall=

We get the same result using Amdahl's Law

BITS Pilani, Pilani
In class
Example3
A benchmark program is run on a 40 MHz processor.The executed program consists of
100,000 instruction executions, with the following instruction mix and clock cycle count:
Instruction Type Instruction Count Cycles per Instruction
arithmetic 45000 1
Data transfer 32000 2
Floating point 15000 2
Control transfer 8000 2
Determine the effective CPI, MIPS rate, and execution time for this program.

BITS Pilani, Deemed to be University under Section 3 of UGC

In class Example3
contnd

SOL: CPI =1.55,MIPS RATE = 25.8,

execution time =38.75ms 67

BITS Pilani, Deemed to be University under Section 3 of UGC

In class
Example4

Q2. Early examples of CISC and RISC are the VAX 11/780 and the IBM RS/6000,
respectively. Using a typical benchmark program, the following machine
characteristics results:
The final column shows that the VAX required 12 times longer than the IBM
measured in CPU
time.
•What is the relative size of the instruction count of the machine code for this

benchmark program running on the two machines?

Processor Clock Frequency Performance CPU time
• What are the CPI (Cycles Per Instruction) values for the two machines?
VAX 11/780 5 MHz 1MIPS 12 x seconds

IBM RS/6000 25 MHz 18 MIPS x seconds

BITS Pilani, Deemed to be University under Section 3 of UGC

Sol: We can express the MIPS rate as: [(MIPS rate)/106] = Ic/T.

So that:

Ic = T  [(MIPS rate)/106].

a) The ratio of the instruction count of the RS/6000 to the VAX is

[x  18]/[12x  1] = 1.5

b)For the Vax, CPI = (5 MHz)/(1 MIPS) = 5

For the RS/6000, CPI = 25/18 = 1.39

BITS Pilani, Deemed to be University under Section 3 of UGC

Benchmark
Programs
• It is a collection of a programs that provides representative
test of a computer in a particular application area
– e.g. SPEC (System Performance Evaluation Corporation)
benchmark suites
– SPEC CPU 2006 is used for measuring performance for the
computational based applications

69
BITS Pilani, Pilani
Review
Questions
• Differentiate between computer organization and architecture?
• What are the four main functions of a computer?
• What are the basic structural components of a computer?
• Describe the computer generations in brief.
• What was the first general purpose microprocessor?
• Consider two implementation of the same ISA. Computer A clock cycle time of 250
ns and a CPI of 2 for a program. Computer B has a clock cycle time of 500 ns and a
CPI of 1.2 for the same program. Which computer is faster for this program and by
how much?
• A program runs on computer A with a 2 GHz clock in 10 seconds. Another
computer B with 4 GHz run this program in 6 seconds. To accomplish this,
computer B will require P times as many clock cycles as computer A to run the
program. Find the value of P.

70
BITS Pilani, Pilani
Performance revision
Consider two different machines, with two different instruction
sets, both of which have a clock rate of 200 MHz. The following
measurements are recorded on the two machines running a
given set of
Instruction benchmark
Type programs:
Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3

Machine B
Arithmetic and 10 1
logic
Load and store 8 2
Branch 2 4
Others
a. Determine the effective 4CPI, MIPS rate, and execution time
3 for each
machine.
b. Comment on the results.
BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.

BITS Pilani, Deemed to be University under Section 3 of UGC

Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.

Time for 1 cycle = 1/f Hz

Time for 1 instruction= CPI X 1/f

CPI X 1/f secs for 1 instruction

There fore in 1 sec no of instructions= f/ CPI

And this MIPS= f/ (CPI X 106)

BITS Pilani, Deemed to be University under Section 3 of UGC
Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine A
Arithmetic and 8 1
logic
Load and store 4 3
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.

And this MIPS= f/ (CPI X 106)

BITS Pilani, Deemed to be University under Section 3 of UGC

Performance revision
clock rate of 200
MHz. :
Instruction Type Instruction Count (millions) Cycles per Instruction
Machine B
Arithmetic and 10 1
logic
Load and store 8 2
Branch 2 4
Others 4 3
a. Determine the effective CPI, MIPS rate, and execution time for each
machine.
b. Comment on the results.

BITS Pilani, Deemed to be University under Section 3 of UGC

BITS
Pilani
Pilani|Dubai|Goa|Hyderabad

Contact Session 3
Computer System Components and
Interconnections
Inter
Connections
• All the units must be connected
• Different type of connection for different type
of unit
– Memory
– Input/Output
– CPU

BITS Pilani, Deemed to be University under Section 3 of UGC

Computer
Modules

■I/O module: I/O is functionally similar to memory.

There are two operations; read and write.
 I/O module may control more than one external
device.
 Refer to each of the interfaces to an external
device as a port and give each a unique address
(e.g., 0, 1, … , M-1).
 In addition, there are external data paths for
the input and output of data with an external
device.
 Finally, an I/O module may be able to send
interrupt signals to the processor.

BITS Pilani, Deemed to be University under Section 3 of UGC

CPU & MEMORY
Connection

• Reads instruction and data

• Writes out data (after processing)

• Sends control signals to other units

•Receives (& acts on) interrupts

Memory Connection

• Receives and sends data

• Receives addresses (of

locations)

• Receives control signals

– Read

– Write

– Timing

BITS Pilani, Deemed to be University under Section 3 of UGC

Input/Output
Connection

• Similar to memory from computer’s viewpoint

• Output
– Receive data from computer
– Send data to peripheral
• Input
– Receive data from peripheral
– Send data to computer
• Receive control signals from computer
• Send control signals to peripherals
– e.g. spin disk
• Receive addresses from computer
– e.g. port number to identify peripheral
• Send interrupt signals (control)

BITS Pilani, Deemed to be University under Section 3 of UGC

Input/Output
Connection

• Similar to memory from computer’s viewpoint

BITS Pilani, Deemed to be University under Section 3 of UGC

Buses
• There are a number of possible interconnection systems
• it has gradually given way to various point-to-point interconnection
structures, which now dominate computer system design
• Single and multiple BUS structures are most common

• e.g. Control/Address/Data bus (PC)

• e.g. Unibus (DEC-PDP)

A communication pathway connecting two or more devices is a bus.

• Parallel lines on circuit boards

• Ribbon cables

• Strip connectors on motherboards

• Sets of wires
BITS Pilani, Deemed to be University under Section 3 of UGC
Different
Bus
Data Bus
• Carries data
– Remember that there is no difference between “data” and “instruction”
at this level
• Width is a key determinant of performance
– 8, 16, 32, 64 bit

Address bus

 Identify the source or destination of data

e.g. CPU needs to read an instruction (data) from a given
location in memory

 Bus width determines maximum memory capacity of

system

e.g. 8080 has 16 bit address bus giving 64k address

BITS Pilani, Deemed to be University under Section 3 of UGC
space
Control
Bus
• Control and timing information
– Memory read/write signal
– Interrupt request
– Clock signals

BITS Pilani, Deemed to be University under Section 3 of UGC

Single Bus
Problems
• Lots of devices on one bus leads to:
– Propagation delays
• Long data paths mean that co-ordination of bus use can
adversely affect performance
• If aggregate data transfer approaches bus capacity
• Most systems use multiple buses to overcome
these problems

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiple Bus
Hierarchies
 A great number of devices on a bus will cause performance to suffer
 Propagation delay - the time it takes for devices to coordinate the use of the bus
 The bus may become a bottleneck as the aggregate data transfer demand
approaches the capacity of the bus (in available transfer cycles/second)
 Traditional Hierarchical Bus Architecture
 Use of a cache structure insulates CPU from frequent accesses to main memory
 Main memory can be moved off local bus to a system bus
 Expansion bus interface
o buffers data transfers between system bus and I/O controllers on expansion
bus
o insulates memory-to-processor traffic from I/O traffic

BITS Pilani, Deemed to be University under Section 3 of UGC

Traditional
(ISA) (with
cache)  Main memory can be
moved off of the local
bus onto a system bus.
 make use of one or
more expansion
buses

small computer system

interface

LANs

WAN
s BITS Pilani, Deemed to be University under Section 3 of UGC
High Performance
Bus  High-performance Hierarchical Bus
Architecture

 Traditional hierarchical bus breaks

down as higher and higher
performance is seen in the I/O devices
to high-speed LANs, such as Fast Ethernet at  Incorporates a high-speed bus
100 Mbps
o specifically designed to
support high- capacity I/O
devices

o brings high-demand
devices into closer integration
with the processor and at the
same time is independent of the
processor

o Changes in processor
architecture do not affect the high-
BITS Pilani, Deemed to be University under Section 3 of UGC
speed bus, and vice versa
Bus
Types
• Dedicated
– Separate data & address lines

• Multiplexed
– Shared lines
– Address valid or data valid control line
– Advantage - fewer lines
– Disadvantages
• More complex control
• Reduction in performance (cannot take place in parallel)

BITS Pilani, Deemed to be University under Section 3 of UGC

Bus
Arbitration
• More than one module controlling the bus
• e.g. CPU and DMA controller
• Only one module may control bus at one time
• Arbitration may be centralised or distributed
• Centralised
– Single hardware device controlling bus access
• Bus Controller
– May be part of CPU or separate
• Distributed
– Each module may claim the bus
– Control logic on all modules

BITS Pilani, Deemed to be University under Section 3 of UGC

What is Bus Arbitration, Bus Mastering and DMA?

• Bus Arbitration – an elaborate system for resolving bus

control conflicts and assigning priorities to the requests for
control of the bus.

• Bus Mastering – a method of enabling different device

controllers on the bus to ‘talk’ to one an other,
without having to go through the CPU.

• DMA(Direct Memory Access) – a method of transferring data

from a hard disk to main memory without having to go
through the CPU.

91
BITS Pilani, Deemed to be University under Section 3 of UGC
Bus Arbitration Methods

• Centralized
Centralized bus arbitration requires hardware (arbiter)that
will grant the bus to one of the requesting devices. This
hardware can be part of the CPU or it can be a separate
device on the motherboard.

• Decentralized
Decentralized arbitration there isn't an arbiter, so the devices
have to decide who goes next. This makes the devices more
complicated, but saves the expense of having an arbiter.

92
BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized
arbitration BBS
Y

BR
Processor

DMA DMA
controller controller
BG 1 BG 2
1 2

•Bus arbiter may be the processor or a separate unit connected to

the bus.
•Normally, the processor is the bus master, unless it grants bus
membership
to one of the DMA controllers.
•DMA controller requests the control of the bus by asserting the
Bus Request (BR) line.
•In response, the processor activates the Bus-Grant1 (BG1) line,
indicating
that the controller may use the bus when it is free.
•BG1 signal is connected to all DMA controllers in a daisy 93 chain
fashion. BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized arbitration
(contd..)

DMA controller 2
asserts the BR Tim
signal. Processor
e
BR
asserts
the BG1 signal

BG1 BG1 signal

propagates to
BG2 DMA#2.

B BS
Y
Bus
mast
er P rocess DMA controller P rocess
or 2 or

Processor relinquishes control

of the bus by setting BBSY to
1.
94
BITS Pilani, Deemed to be University under Section 3 of UGC
Centralized arbitration
(contd..)

• Centralized arbitration scheme with one Bus-Request (BR) line

and one Bus-Grant (BG) line forming a daisy chain.
• Several pairs of BR and BG lines are possible, perhaps one per
device as in the case of interrupts.
• Bus arbiter has to ensure that only one request is granted at
any given time.
• It may do so according to a fixed priority scheme, or a
rotating
priority scheme.
• Rotating priority scheme:
 There are four devices, and initial priority is 1,2,3,4.
 After the request from device 1 is granted, the priority
changes to 2,3,4,1. 95
BITS Pilani, Deemed to be University under Section 3 of UGC
Distributed
arbitration
• All devices waiting to use the bus share the responsibility
of carrying out the arbitration process.
 Arbitration process does not depend on a central arbiter and hence distributed
arbitration has higher reliability.
• Each device is assigned a 4-bit ID number.
• All the devices are connected using 5 lines, 4 arbitration
lines to transmit the ID, and one line for the Start-
Arbitration signal.
• To request the bus a device:
 Asserts the Start-Arbitration signal.
 Places its 4-bit ID number on the arbitration lines.

• The pattern that appears on the arbitration lines is the

logical-OR of all the 4-bit device IDs placed on the
arbitration lines. BITS Pilani, Deemed to be University under Section 3 of UGC
25
Distributed
•Device Aarbitration
has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
•Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
•Pattern that appears on the arbitration lines is the logical OR of the
patterns: -
Pattern 0111 appears on the arbitration lines.

Arbitration process:
•Each device compares the pattern that appears on the arbitration
lines to its own ID, starting with MSB.
•If it detects a difference, it transmits 0s on the arbitration lines for
that and all
lower bit positions.
•Device A compares its ID 5 with a pattern 0101 to pattern 0111.
•It detects a difference at bit position 2, as a result, it transmits a
pattern 0100 on the arbitration lines.
•The pattern that appears on the arbitration lines is the logical-OR 26of
0100 and 0110, which is 0110. BITS Pilani, Deemed to be University under Section 3 of
Timin
g
• Co-ordination of events on bus
• Synchronous
– Events determined by clock signals and synchronized
on leading edge of clock
– All devices can read clock line
– Usually a single cycle for an event
Asynchronous
⮚ The occurrence of one event on a bus
depends on the occurrence of a previous event
⮚ Events on the bus are not synchronized with
the clock
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
bus

Bus clock

Bus cycle

28
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.) T im
e
Bus
clock

Address
and
comman
d

Dat
a

t0 t1 t2

Bus
Master places the cycle
device address Addressed slave
and command on places Master “strobes” the
the bus, and data on the data data on the data
indicates that lines lines into its input
it is a Read buffer, for a Read
•In operation. operation.
case of a Write operation, the master places the data on the bus
along with the address and commands at time t0.
•The slave strobes the data into its input buffer 10
at time t2. 0
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.)
• Once the master places the device address and command on the bus, it
takes time for this information to propagate to the devices:
– This time depends on the physical and electrical characteristics of
the
bus.
• Also, all the devices have to be given enough time to decode the
address
and control signals, so that the addressed slave can place data on the
bus.

• Width of the pulse t1 - t0 depends on:

– Maximum propagation delay between two devices connected to

the bus.
10
1
– Time taken by all BITS
thePilani,
devices to todecode
Deemed theunder
be University address
Sectionand control
3 of UGC
Synchronous bus(contnd.)
• At the end of the clock cycle, at time t2, the master strobes the data on the
data lines into its input buffer if it’s a Read operation.
– “Strobe” means to capture the values of the data and store them into
a buffer.
• When data are to be loaded into a storage buffer register, the data should
be available for a period longer than the setup time of the device.

• Width of the pulse t2 - t1 should be longer than:

– Maximum propagation time of the bus plus

– Set up time of the input buffer register of

the master.
10
2
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.) Time
Address & Bus
command clock Data reaches
appear on Seen the master.
tAM
the bus. by
master
Address
and
Dat
comman
Address & d a
tDM
command
Seen by
reach the slave tAS
slave. Address Data
and appears
on the bus.
comman
d
Dat
a
tDS

t t 2
•Signals do not appear on
0
the bus as soon as they are placed on the
1

t
bus, due to the propagation delay in the interface circuits.
•Signals reach the devices after a propagation delay which
depends on the characteristics of the bus.
•Data must remain on the bus for some time after t2 equal to
10
the hold time of the buffer. 3
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous bus(contnd.)
• Most buses have control signals to represent a
response from the slave.
• Control signals serve two purposes:
– Inform the master that the slave has recognized the address, and is ready to
participate in a data transfer operation.
– Enable to adjust the duration of the data transfer operation based on the
speed of the participating slaves.

• High-frequency bus clock is used:

– Data transfer spans several clock cycles instead of just one clock cycle as in the
earlier case.

10
4
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous
bus(contnd.)
Slave-ready signal is an acknowledgement from the slave to the master to confirm that the
valid data has been sent. Depending on when the slave-ready signal is asserted, the duration
of the data transfer can change.
Address & Tim
command e
requesting a Read 1 2 3 4
operation appear
on the bus.
C loc
k

Addres
s

C omman
d Master strobes data
into the input buffer.
Dat
a

Slave-
ready

Slave places the data on the Clock changes are seen by all the
bus,
and asserts Slave-ready signal. devices at the same
time. 34
BITS Pilani, Deemed to be University under Section 3 of UGC
Synchronous Timing
Diagram

BITS Pilani, Deemed to be University under Section 3 of UGC

Asynchronous
bus
⚫ Data transfers on the bus is controlled by a handshake between the master and the
slave.

⚫ Common clock in the synchronous bus case is replaced by two timing control lines:

⚫ Master-ready,

⚫ Slave-ready.

⚫ Master-ready signal is asserted by the master to indicate to the slave that it is ready
to participate in a data transfer.

⚫ Slave-ready signal is asserted by the slave in response to the master-ready from the
master, and it indicates to the master that the slave is ready to participate in a data
transfer.

36
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous
bus(contnd.) T im
e
Address
and
command

Master-

ready
Dat
a
S lave-

t0 t1 t2 t3 t4 t5
ready
Bus cycle
t0 - Master places the address and command information on the bus.
t1 - Master asserts the Master-ready signal. Master-ready signal is asserted
at t1 instead of t0 t2 - Addressed slave places the data on the bus and asserts
the Slave-ready signal.
t3 - Slave-ready signal arrives at the master.
t4 - Master removes the address and command information.
and the Slave-ready
t5 - Slave receives the signal from of the Master-ready signal from 1 to 0. It 37
transition
the bus. BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous vs.
Synchronous bus

• Advantages of asynchronous bus:

– Eliminates the need for synchronization between the sender and the receiver.

– Can accommodate varying delays automatically, using the Slave-ready signal.

• Disadvantages of asynchronous bus:

– Data transfer rate with full handshake is limited by two-round trip delays.
– Data transfers using a synchronous bus involves only one round trip delay, and
hence a synchronous bus can achieve faster rates.

38
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous Timing – Read
Diagram

BITS Pilani, Deemed to be University under Section 3 of UGC

Asynchronous Timing – Write
Diagram

BITS Pilani, Deemed to be University under Section 3 of UGC

PCI
⚫ Peripheral Component Interconnect Bus
⚫ Introduced in 1992

⚫ Low-cost bus

⚫ Processor independent

⚫ Plug-and-play capability

⚫ In today’s computers, most memory transfers involve a burst of data rather than just one
word. The PCI is designed primarily to support this mode of operation.
⚫ The bus supports three independent address spaces: memory, I/O, and configuration.

⚫ we assumed that the master maintains the address information on the bus until data transfer
is completed. But, the address is needed only long enough for the slave to be selected. Thus,
the address is needed on the bus for one clock cycle only, freeing the address lines to be used
for sending data in subsequent clock cycles. The result is a significant cost reduction.
⚫ A master is called an initiator in PCI terminology. The addressed device that responds to read
and write commands is called a target.
BITS Pilani, Deemed to be University under Section 3 of UGC
PCI Bus
Arbiter

 PCI makes use of a centralized, synchronous arbitration

scheme
 each master has a unique request (REQ) and grant
(GNT) signal
 does not dictate a particular arbitration algorithm
 a first-come-first-served approach
 a round-robin approach
BITS Pilani, Deemed to be University under Section 3 of UGC
 or some sort of priority scheme.
PCI Bus
Arbitration
devices A and B are IRDY# s/t/s Initiator
arbitrating Ready.
Driven by current bus
master (initiator of
transaction). During a
read, indicates that the
master is prepared to
accept data; during a
write, indicates that
valid data are present
on AD.
TRDY# s/t/s Target
R eady.
Driven by the target
(selected device).
During a read,
indicates that valid data
are present on AD;
during a write, indicates
that target is ready to
BITS Pilani, Deemed to be University under Section 3 of UGC
PCI Bus Arbitration between Two
Masters
a. At some point prior to the start of clock 1, A has asserted its RE Q
signal. The

arbiter samples this signal at the beginning of clock cycle 1.

b. During clock cycle 1, B requests use of the bus by asserting its

R E Q signal.

c. At the same time, the arbiter asserts GNT-A to grant bus access to
A.

d. Bus master A samples GNT-A at the beginning of clock 2 and

learns that it has been granted bus access. It also finds IRDY and

TRDY de-asserted, indicating that the bus is idle. Accordingly, it

asserts FRAME and places the address information on the address

bus and the command on the C/BE bus (not

115
shown). It also continues to assert
BITS REQ-A, because
Pilani, Deemed it has
to be University under a second
Section 3 of
PCI Bus Arbitration between Two
Masters
e. The bus arbiter samples all REQ lines at the beginning of clock 3 and makes an
arbitration

decision to grant the bus to B for the next transaction. It then asserts GNT-B and
deasserts

GNT-A. B will not be able to use the bus until it returns to an idle state.

f. A deasserts FRAME to indicate that the last (and only) data transfer is in
progress. It puts

the data on the data bus and signals the target with IRDY.The target reads the
data at the

beginning of the next clock cycle.

g. At the beginning of clock 5, B finds IRDY and FRAME deasserted and so is able
to take

control of the bus by asserting FRAME. It also deasserts its REQ line, because it
only wants to

perform one transaction.

116
Subsequently, master A is granted access to BITS
the Pilani,
bus for its next
Deemed transaction.
to be University under Section 3 of
Progra
m

https://yasmin-cpu-os-
simulator.software.informer.com/6.1/

46
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani Campus

Chapter 9
• Computer
Arithmetic
Arithmetic & Logic Unit (ALU)
 Part of the computer that actually performs arithmetic
and logical operations on data
 All of the other elements of the computer system are
there mainly to bring data into the ALU for it to process
and then to take the results back out
 Based on the use of simple digital logic devices that can
store binary digits and perform simple Boolean logic
operations

BITS Pilani, Deemed to be University under Section 3 of UGC

ALU Inputs and
outputs

Fig
9.1

BITS Pilani, Deemed to be University under Section 3 of UGC

Integer Representation
In the binary number system arbitrary numbers can be
represented with:
▪The digits zero and one
▪The minus sign (for negative numbers)
▪The period, or radix point (for numbers with a fractional
component)

▪For purposes of computer storage and processing we do not

have the benefit of special symbols for the minus sign and
radix point

▪Only binary digits (0,1) may be used to represent numbers

BITS Pilani, Deemed to be University under Section 3 of UGC

Signed Nos 2’s
complement
representation
Eg +6=
0110
-6= 1010

BITS Pilani, Deemed to be University under Section 3 of UGC

Consider any binary number in 2’s
complement 101101
= -26-1.1 + 24.0 + 23.1 + 22.1 +
21.0+ 20.1
=-32 + 0 + 8 + 4 + 0 + 1= -19
The 2’s complement of 101101 is
010011= 19

Now if a 1 is appended then

1101101 ?

1101101= -27-1.1
+ 25.1 + 24.0 + 23.1 +
22.1+ 21.0 + 20.1
= -64+ 32 +0 +8+4+0+1
BITS Pilani, Deemed to be University under Section 3 of UGC
Addition and Subtraction

OVERFLOW RULE: If two numbers are added, and they are both positive or
both negative, then overflow occurs if and only if the result has the opposite
sign.

BITS Pilani, Deemed to be University under Section 3 of UGC

Addition and Subtraction

SUBTRACTION RULE: To subtract one number (subtrahend) from

another (minuend), take the twos complement (negation) of the
subtrahend and add it to the minuend.

BITS Pilani, Deemed to be University under Section 3 of UGC

Block Diagram of Hardware for
Addition and Subtraction

OF= Overflow bit

SW =Switch
(select addition
or subtraction)
BITS Pilani, Deemed to be University under Section 3 of UGC
Multiplication of
UNSIGNED
INTEGERS
1.Multiplication involves the generation of partial products, one for
each digit in the multiplier. These partial products are then
summed to produce the final product.
2.The partial products are easily defined. When the multiplier
bit is 0, the partial product is 0.When the multiplier is 1, the
partial product is the multiplicand.
3.The total product is produced by summing the partial products.
For this operation, each successive partial product is shifted one
position to the left relative to the preceding partial product.
4.The multiplication of two n-bit binary integers results in a product
of up to 2n bits in length (e.g., 11 X 11=1001).

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add First

cycl
e
0 0101 1110 1011 Shift

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
0 1101 1111 1011 Add

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of
UNSIGNED
INTEGERS
C A Q M
0 0000 1101 1011 Initial value
0 1011 1101 1011 Add 1st cycle
0 0101 1110 1011 Shift
0 0010 1111 1011 shift 2nd cycle
0 1101 1111 1011 Add 3rd cycle
0 0110 1111 1011 shift

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of SIGNED INTEGERS

11 (1011) by 13 (1101) to get 143

(10001111)

-5(1011) times -3 (1101) equals -113

(10001111)

BITS Pilani, Deemed to be University under Section 3 of UGC

Booth Multiplier

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial
value

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand Comments10 A-M
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiplication of -8 X 5
01
A+M
A Q(multiplier) Q-1 M(Multiplicand 10 A-M
Comments
00000 00101 0 11000 Initial value
01000 00101 0 11000 A-M
00100 00010 1 11000
11100 00010 1 11000 A+M
11110 00001 0 11000
00110 00001 0 11000 A-M
00011 00000 1 11000
11011 00000 1 11000 A-M

BITS Pilani, Deemed to be University under Section 3 of UGC

The basic grammar school division algorithm
Division Algorithm and Hardware
 The Divisor register, ALU, and
Remainder register are all 64 bits wide,
Dividend  Quotient x Divisor 
Remainder
with only the Quotient register being
32 bits.

 The 32-bit divisor starts in the left

half of the Divisor register and is
• shifted right 1 bit each iteration.

 The remainder is initialized with the

dividend. Control decides when to shift
the Divisor and Quotient
• registers and when to write the new
• value into the Remainder register.
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
2a: shift the quotient register to
the left , seting up the new
rightmost bit to 1

it takes n + 1 steps to get the

proper quotient and remainder
BITS Pilani, Deemed to be University under Section 3 of UGC
Division 7/2

Shift left logical

(sll) BITS Pilani, Deemed to be University under Section 3 of UGC
9/4
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001

42
9/4
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001

43
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001

15
5
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001

15
6
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001

15
7
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
3 1: Rem= Rem-Div 0000 0001 0000 1111 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0001 0000 0000 1001
3: Shift Div Right 0000 0000 1000 0000 1001

15
8
n step Quotient Divisor remainder
0 Initial value 0000 0100 0000 0000 1001
1 1: Rem= Rem-Div 0000 0100 0000 1100 0000
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0100 0000 0000 1001
3: Shift Div Right 0000 0010 0000 0000 1001
2 1: Rem= Rem-Div 0000 0010 0000 1110 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0010 0000 0000 1001
3: Shift Div Right 0000 0001 0000 0000 1001
3 1: Rem= Rem-Div 0000 0001 0000 1111 1001
2b: Rem<0 => +Div, sll Q, Q0=0 0000 0001 0000 0000 1001
3: Shift Div Right 0000 0000 1000 0000 1001
4 1: Rem= Rem-Div 0000 0000 1000 0000 0001
2a: Rem>0 => sll Q, Q0=1 0001 0000 1000 0000 0001
3: Shift Div Right 0001 0000 0100 0000 0001
5 1: Rem= Rem-Div 0001 0000 0100 1111 1101
2b: Rem<0 => +Div, sll Q, Q0=0 0010 0000 0100 0000 0001
3: Shift Div Right 0010 0000 0010 0000 0001
15
9
Signed Division
Remember the signs of the divisor and dividend and then negate
the quotient if the signs disagree.
Dividend = Quotient  Divisor + Remainder
Look at the example of
± 11 ÷ ±5
+11 = +2 𝑎𝑛𝑑 + −11 = −2 𝑎𝑛𝑑 −
+5 +5
1; 1

The 1st case is easy

+11 ÷ +5: Quotient = +2, Remainder = +1

Checking the results:

11 = 2 × 5 + (+1) = 10 + 1
BITS Pilani, Deemed to be University under Section 3 of UGC
Signed Division
Dividend = Quotient  Divisor +
+11 −11
± 11 ÷ = +2 𝑎𝑛𝑑 + = −2 𝑎𝑛𝑑
Remainder
±5 +5 1;
+5 −1
2nd case If we change the sign of the dividend, the quotient must
change as well:
–11 ÷ +5: Quotient = –2

Rewriting our basic formula to calculate the remainder:

Remainder = (Dividend – Quotient × Divisor) = –11 – (–2 × +5)
= –11–(–10) = –1
So,
–11 ÷ +5: Quotient = –2, Remainder = –1

BITS Pilani, Deemed to be University under Section 3 of UGC

Signed Division

Dividend = Quotient  Divisor +

+11 −11
± 11 ÷ = +2 𝑎𝑛𝑑 + = −2 𝑎𝑛𝑑
Remainder
±5 + 1; +
5 5 −1
Checking the results again:
–11 = – 2  5 + (–1) = – 10 – 1

The same result is obtained if

–11 = – 3  5 + (4) = – 15 + 4

absolute value of the quotient would then change

How to avoid this?

BITS Pilani, Deemed to be University under Section 3 of UGC
Signed Division
Dividend = Quotient  Divisor +
± 11 ÷
Remainder +11 = +2 𝑎𝑛𝑑 + −11 = −2 𝑎𝑛𝑑 −
+5 +5
±5 1; 1
This anomalous behavior is avoided by following the rule that the
dividend and remainder must have the same signs, no matter what
the signs of the divisor and quotient.

+11
−2 𝑎𝑛𝑑 + +11 ÷ –5:
−5
1
−11
Quotient = –2,
= = +2 𝑎𝑛𝑑
−5 –11 ÷ –5:
Remainder = +1
−1
; Quotient = +2,
Remainder = –1

Thus the correctly signed division algorithm negates the quotient if

the signs of the operands are opposite and makes the sign of the
nonzero remainder match the dividend

BITS Pilani, Deemed to be University under Section 3 of UGC

Real
Numbers

Numbers with fractions

Could be done in pure
binary
▪1001.1010 = 24 + 20
+2-1 + 2-3 =9.625
Where is the binary
point? Fixed?
▪Very
limited
Moving? BITS Pilani, Deemed to be University under Section 3 of UGC
Floating
Point
• Normalized scientific notation: single non-zero
digit to the left of the decimal (binary) point –
example: 3.5 x 109

•1.010001 x 2-5two = (1 + 0 x 2-1 + 1 x 2-2 + … + 1

x 2-6) x 2-5ten

•A standard notation enables easy exchange of data

between machines and simplifies hardware
algorithms – the
IEEE 754 standard defines how floating point
numbers are represented
BITS Pilani, Deemed to be University under Section 3 of UGC
Sign and Magnitude
Representation
Sig Expone Fractio
1
n nt 8 n 23
bit
S bits E bits F

• More exponent bits  wider range of numbers (not necessarily

more
numbers – recall there are infinite real
numbers)

• More fraction bits  higher precision

• Register value = (-1)S x F x 2E

•Since we are only representing normalized numbers,

we are guaranteed that the number is of the form
1.xxxx.. BITS Pilani, Deemed to be University under Section 3 of UGC
Sign and Magnitude
Representation

Sig Expone Fractio

1
n nt 8 n 23
bit
S bitsE bitsF

• Largest number that can be

represented:
• Smallest number that can be represented:

0.000000001ten or 1.0ten × 109 (seconds in a

nanosecond) 3,155,760,000ten or 3.15576ten × 109

(seconds in a typical
century) BITS Pilani, Deemed to be University under Section 3 of UGC
Biased Exponent Representation
 How to represent a signed exponent? Choices are …
 Sign + magnitude representation for the exponent
 Two’s complement representation
 Biased representation
 IEEE 754 uses biased representation for the exponent
 Value of exponent
= val(E) = E – Bias (Bias is a constant)

 Recall that exponent field is 8 bits for single precision

 E can be in the range 0 to 255
 E = 0 and E = 255 are reserved for special use
(discussed later)

 E = 1 to 254 are used for normalized floating point numbers

 Bias = 127 (half of 254), val(E) = E – 127

 val(E=1) = –126, val(E=127)

BITS Pilani, Deemed= 0,University
to be val(E=254) =3 of127
under Section UGC
Biased Exponent –
Cont’d

 Value of a Normalized Floating Point Number is

(–1) S × (1.F)2 × 2E – Bias

(–1) S × (1.f1f2f3f4 … ) 2 × 2E – Bias

(–1) S × (1 + f1×2-1 + f2×2-2 + f3×2-3 + f4×2-4 … ) 2 × 2E

– Bias

BITS Pilani, Deemed to be University under Section 3 of UGC

Examples of Single Precision
Float
 What is the decimal value of this Single Precision float?
 10111110001000000000000000000
000
 Solution:
 Sign = 1 is negative
 Exponent = (01111100)2 = 124, E – bias = 124 –
127 = – 3
O Significand
 Significand= (1.0100 … 0)
= (1.0100 …2 X0)22 –=
3 = 0.0010100=
1 + 2-2 = 1.25 (1. is
r –0.15625
implicit)
 What is the decimal value of?

 0 Value in decimal = –1.25 × 2 – 3 = –0.15625
100000100100110000000000000
0000
 Solution: Implicit

 Value in decimal = +(1.01001100 … 0)2 ×

2130–127

= (1.01001100 … 0)
BITS Pilani, Deemed to 2be× 23 =under
University (1010.01100
Section 3 of UGC … 0)2
Converting FP Decimal to
Binary

 Convert –0.8125 to binary in single

precision

 Solution:
=
 Fraction bits 1.25can be obtained using multiplication by
 0.8125 × 2== 0.51.625
 0.625 × 2 = 1.0
0.8125
0.25=× (0.1101)2 = ½ + ¼ + 1/16 = 13/16
2

 0.5 ×
Fraction 2
= (0.1101)2 = (1.101)2 × 2
– 1 (Normalized)

 Stop when fractional part is 0

 Exponent = – 1 + Bias = 126 (single precision)= 7E H

101111110101000000000000000
00000 BITS Pilani, Deemed to be University under Section 3 of UGC
Largest Normalized Float

 What is the Largest normalized float?

 Solution for Single Precision:

 011111110111111111111111111
11111

 Exponent – bias = 254 – 127 = 127 (largest

exponent for SP)
 Significand = (1.111 … 1)2 = almost 2

 Value in decimal ≈ 2 × 2127 ≈ 2128 ≈ 3.4028 … ×

1038

 Overflow: exponent is too large to fit in the

exponent field BITS Pilani, Deemed to be University under Section 3 of UGC
Smallest Normalized Float

 What is the smallest (in absolute value) normalized

float?

 Solution for Single Precision:

00000000100000000000000000000
000

 Exponent – bias = 1 – 127 = –126 (smallest

exponent for SP)

 Significand = (1.000 … 0)2 = 1

 Value in decimal = 1 × 2 –126 = 1.17549 … ×

10 –38
BITS Pilani, Deemed to be University under Section 3 of UGC
Zero, Infinity, and NaN

 Zero
 Exponent field E = 0 and fraction F = 0
 +0 and – 0 are possible according to
sign bit S
 Infinity
 Infinity is a special value represented with maximum
E and
F=0
 For single precision with 8-bit exponent: maximum E
= 255
 Infinity can result from overflow or division by zero
 +∞ and – ∞ are possible according to sign bit S

 NaN (Not a Number)

 NaN is a special value represented with maximum E
and BITS Pilani, Deemed to be University under Section 3 of UGC
Summary of IEEE 754
Encoding

BITS Pilani, Deemed to be University under Section 3 of UGC

Floating Point Addition
Example
Consider Adding (Single-Precision Floating-
Point):
+ 1.111001000000000000000102 × 24
+ 1.100000000000001100001012 × 22
 Cannot add significands … Why?

 Because exponents are not equal

 How to make exponents equal?
 Shift the significand of the lesser exponent
right
 Difference between the two exponents = 4 – 2
=2
So, shift right second number by 2 bits and
increment exponent
1.100000000000001100001012 × 22 =
BITS Pilani, Deemed to be University under Section 3 of UGC
4
Floating-Point Addition –
cont'd
 Now, ADD the Significands:
+ 1.11100100000000000000010 × 24
+ 1.10000000000000110000101 × 22
----------------------------------------------------------------------------------------------------------------------------- -

+ 1.11100100000000000000010 × 24
+ 0.01100000000000001100001 01 × 24 (shift right)

+10.01000100000000001100011 01 × 24 (result)

 Addition produces a carry bit, result is NOT

normalized
 Normalize Result (shift right and increment
exponent):
+ 10.01000100000000001100011 01 × 24
= + 1.00100010000000000110001 101 × 25
BITS Pilani, Deemed to be University under Section 3 of UGC
Rounding
 Single-precision requires only 23 fraction bits
 However, Normalized result can contain additional
bits
1.00100010000000000110001 | 1 01 × 25
Round Bit: R = 1
Sticky Bit: S = 1
 Two extra bits are needed for rounding
 Round bit: appears just after the normalized result
 Sticky bit: appears after the round bit (OR of all
additional bits)
 Since RS = 11, increment fraction to round to
nearest
1.00100010000000000110001 × 25
+1
----------------------------------------------
BITS Pilani, Deemed to be University under Section 3 of UGC
5
 If the sign bits of the operands are different
 Consider Adding:
+ 1.00000000101100010001101 × 2-6
– 1.00000000000000010011010 × 2-1
-------------------------------------------------------------------------------
+ 0.00001000000001011000100 01101 × 2-1 (shift right 5
bits)
– 1.00000000000000010011010 × 2-1
---------------------------------------------------------------------------------
0 0.00001000000001011000100 01101 × 2-1
1 0.11111111111111101100110 × 2-1 (2's
complement)

1 1.00001000000001000101010 01101 × 2-1 (ADD)

- 0.11110111111110111010101 10011 × 2-1 (2's

complement)
 2's complement of result is required if result is
negative BITS Pilani, Deemed to be University under Section 3 of UGC
Subtraction FP
contnd.
+ 1.00000000101100010001101 × 2-6
– 1.00000000000000010011010 × 2-1
- 0.11110111111110111010101 10011 × 2-1 (result is
negative)
 Result should be normalized
 For subtraction, we can have leading zeros. To
normalize, count the number of leading zeros,
then shift result left and decrement the exponent
accordingly.
Guard bit
- 0.11110111111110111010101 1 0011 × 2-1
- 1.11101111111101110101011 0011 × 2-2 (Normalized)
Guard bit
 Guard bit: guards against loss of a fraction bit
Needed for subtraction, when result has a leading
zero and should beBITSnormalized.
Pilani, Deemed to be University under Section 3 of UGC
Subtraction FP contnd.
 Next, normalized result should be
rounded
Guard bit
011××2-12-2
0.11110111111110111010101 1 0 0011
- 1.11101111111101110101011
Round bit: R=0 (Normalized)
Sticky bit: S = 1
Since R = 0, it is more accurate to truncate the
result even if S = 1. We simply discard the extra
bits.

- 1.11101111111101110101011 0 011 × 2-2

(Normalized)
- 1.11101111111101110101011 × 2-2 (Rounded to
nearest) Exponent -2 is biased to -2+127= 125=
7D H
 IEEE 754 Representation
BITS Pilani, Deemed to be University under Section 3 of UGC
of Result
Rounding to Nearest
Even
 Normalized result has the form: 1. f1 f2 … fl R S
 The round bit R appears after the last fraction bit
fl
 The sticky bit S is the OR of all remaining
additional bits
 Round to Nearest ResultEven:
is Exact, no need
default for rounding
rounding mode
 Four cases for Truncate
RS: result by discarding RS
 RS = 00  Increment result: ADD 1 to last
 RS = 01  fraction bit
 RS = 11  Tie Case (either truncate or
 RS Check Lastincrement
= 10  fraction bitresult)
fl (f23 for single-precision or
f52 for double)
 If fl is 0 then truncate result to keep fraction even
 If fl is 1 then increment result to make fraction even
BITS Pilani, Deemed to be University under Section 3 of UGC
Floating Point Multiplication Example

 Consider multiplying:
-1.110 1000 0100 0000 1010 00012 × 2–4
× 1.100 0000 0001 0000 0000 00002 × 2–2
 Unlike addition, we add the
exponents of the operands
 Result exponent value = (–4) + (–
2) = – 6
 Using the biased representation: EZ
= EX + EY – Bias
 EX = (–4) + 127 = 123 (Bias = 127
for single precision)
 EY = (–2) + 127 = 125
 EZ = 123 + 125 – 127 = 121
(value = –6)
 Sign bit of product can be
BITS Pilani, Deemed to be University under Section 3 of UGC
FP Multiplication contnd.
 Now multiply the
(Multiplicand)
significands: 1.11010000100000010100
(Multiplier) 001
×
111010000100000010100001
1.1000000000100000000
111010000100000010100001 0000
1.11010000100000010100001
10.101110001111101111110011001010000100
0000000000
 24 bits × 24 bits  48 bits (double number
of bits)
 Multiplicand × 0 = 0 Zero rows are eliminated
 Multiplicand × 1 = Multiplicand (shifted left)

BITS Pilani, Deemed to be University under Section 3 of UGC

FP Multiplication contnd.
 Normalize Product:
-10.10111000111110111111001100... × 2-6
Shift right and increment exponent because of carry
bit
= -1.010111000111110111111001100... × 2-5
 Round to Nearest Even: (keep only 23 fraction
bits)
1.01011100011111011111100 | 1 100... × 2-5

Round bit = 1, Sticky bit = 1, so increment

fraction Final result = -
1.01011100011111011111101 × 2-5
 IEEE 754 Representation
1 0 1 1 1 1 0 1 0 BITS
01 011100011111
Pilani, Deemed to be University under Section 3 of UGC
Examples
Final representation: (-1)S x (1 + Fraction) x 2(Exponent
– Bias)

• Represent -0.75ten in single and double-precision

formats

Single: (1 + 8 + 23)

Double: (1 + 11 + 52)

•What decimal number is represented by the

following single-precision number?
1 1000 0001 01000…0000
BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani Campus

•DS Rao COA -IS ZC353

Objectives

 Machine instructions and program

execution, including branching and
subroutine call and return operations.

 Addressing methods for accessing

 Assembly language for representing

machine instructions, data, and
programs.

 Program-controlled Input/Output
operations. BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and
Operation n
bits
first word
Memory consists second
of many millions word

of storage cells, •
each of which •
•
can store 1 bit.
i th
Data is usually word

accessed in n- •
bit groups. n is •
•
called word
length. last
word

Figure Memory
BITS Pilani, Deemed to be 2.5.
University underwords.
Section 3 of UGC
Memory Location, Addresses, and Operation

32-bit word length example

32 bits
b3 b3
b1 b0

•
1 0

•
•
Sign bit:b31= 0 for positive numbers
b31= 1 for negative numbers
(a) A signed integer

8 bits 8 bits 8 bits 8 bits

ASCII ASCII ASCII ASCII

character character character
character
(b) Four characters
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and Operation

 To retrieve information from memory, either for one

word or one byte (8-bit), addresses for each location
are needed.
 A k-bit address memory has 2k memory locations,
namely 0 to 2k-1, called memory space.

 24-bit memory: 224 = 16,777,216 = 16M (1M=220)

 32-bit memory: 232 = 4G (1G=230)

 1K(kilo)=210
 1T(tera)=240

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory Location, Addresses, and
Operation
 It is impractical to assign distinct addresses to
individual bit locations in the memory.

 The most practical assignment is to have

successive addresses refer to successive byte locations
in the memory – byte- addressable memory.

 Byte locations have addresses 0, 1, 2, …

 If word length is 32 bits, they successive words

are located at addresses 0, 4, 8 , …
BITS Pilani, Deemed to be University under Section 3 of UGC
Big-Endian and Little-Endian
Assignments
Big-Endian: lower byte addresses are used for the most significant bytes of
the word
Little-Endian: opposite ordering. lower byte addresses are used for the less
significant bytes of the word
Word
addres Byte Byte
s address address
0 0 1 2 3 0 3 2 1 0

4 4 5 6 7 4 7 6 5 4

• •
• •
• •

k k k k
k k k k k k
2 - 2 -4 2 -3 2- 2 - 1 2 - 2- 2 - 2 2 - 3 2 -4
4 2 4 1

(a) Big-endian (b) Little-endian

assignment assignment
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory Location, Addresses, and
Operation
 Address ordering of bytes
 Word alignment
 Words are said to be aligned in memory if they begin at a byte
address. that is a multiple of the number of bytes in a word.
– 16-bit word: word addresses: 0, 2, 4 , ….
– 32-bit word: word addresses: 0, 4, 8 , ….
– 64-bit word: word addresses: 0, 8,16,….
Memory Operation
Load (or Read or Fetch)
 Copy the content. The memory content doesn’t change.
 Address – Load
 Registers can be

used Store (or Write)

 Overwrite the
content in memory
 Address and Data –
Store
BITS Pilani, Deemed to be University under Section 3 of UGC
“Must-Perform”
 Operations
Data transfers between the memory and the processor
registers
 Arithmetic and logic operations on data
 Program sequencing and control
 I/O transfers
Assembly Language Notation
 Represent machine instructions and
programs.
 Move LOC, R1 = R1←[LOC]
 Add R1, R2, R3 = R3 ←[R1]+[R2]
 or
 Add R1, R2, R3 = R1 ←[R2]+[R3]

BITS Pilani, Deemed to be University under Section 3 of UGC

CPU
Organization
 Single Accumulator
 Result usually goes to the Accumulator
 Accumulator has to be saved to
memory quite often
 General Register
 Registers hold operands thus reduce
memory traffic
 Register book keeping
 Stack
 Operands and result are always in the
stack

BITS Pilani, Deemed to be University under Section 3 of UGC

Instruction
Formats
Opcode Operand(s) or Address(es)

Three-Address Instructions
▪ADD R1, R2, R3 R1 ← R2 + R3 or R3 ← R1
+ R2
Two-Address Instructions
▪ADD R1, R2 R1 ← R1 + R2
One-Address Instructions
▪ADD M AC ← AC + M[AR]
Zero-Address
▪ADD Instructions
TOS ← TOS + (TOS –
RISC Instructions 1)
▪Lots of registers. Memory is restricted to Load
& Store
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
Example: Evaluate (A+B) 
(C+D)
Three-Address
1. ADD R1, A, B ; R1 ← M[A] +
2. ADD M[B]
R2, C, D
; R2 ← M[C] +
M[D]
3. MUL X, R1, R2 ; M[X] ← R1  R2
Example: Evaluate (A+B) 
(C+D) Two-Address
1. MOV R1, A ; R1 ← M[A]
2. ADD R1, B ; R1 ← R1 +
M[B]
3. MOV R2, C ; R2 ← M[C]
4. ADD R2, D ; R2 ← R2 +
M[D]
5. MUL R1, R2 ; R1 ← R1  R2
6. MOV X, R1 ; M[X] ← R1

BITS Pilani, Deemed to be University under Section 3 of UGC

Instruction
Formats
Exampl Evaluate (A+B)  (C+D) One-
e: 1. Address
LOAD A ; AC ← M[A]
2. ADD B ; AC ← AC +
3. STORE M[B]
T
4. LOAD ; M[T] ← AC
5. ADD C
; AC ← M[C]
6. MUL D
7. STORE ; AC ← AC +
T
M[D]
X
; AC ← AC  M[T]
; M[X] ← AC
Example: Evaluate  (C+D) RISC
(A+B)
1. LOAD R1, A ; R1 ← M[A]
2. LOAD R2, B ; R2 ← M[B]
3. LOAD R3, C ; R3 ← M[C]
4. LOAD R4, D ; R4 ← M[D]
5. ADD R1, R1, R2 ; R1 ← R1 + R2
6. ADD R3, R3, R4 ; R3 ← R3 + R4
7. MUL R1, R1, R3 ; R1 ← R1  R3
8. X, R1 ; M[X] ← R1
STORE BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Data
Types
 8 (byte), 16 (halfword), 32 (word) bits
Halfword and word accesses should be word
aligned Nonaligned access alternatives
▪Default
– Treated as truncated
– Bits[1:0] treated as zero for word
– Bit[0] treated as zero for halfword
– Load single word instructions rotate right word aligned data
transferred by non word-aligned address one, two or three
bytes Alignment checking
▪Data abort signal indicates alignment fault for attempting
unaligned access
▪Unaligned access
▪Processor uses one or more memory accesses to generate
transfer of adjacent bytes transparently
BITS Pilani, to the
Deemed to be University underprogrammer
Section 3 of UGC
ARM Data Types
contnd.
Unsigned integer interpretation supported for all types

Twos-complement signed integer interpretation supported for

all types

Majority of implementations do not provide floating-point

hardware

▪Saves power and area

▪Floating-point arithmetic implemented in software
▪Optional floating-point coprocessor
▪Single- and double-precision IEEE 754 floating point data
types

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM Operation Types
 The ARM architecture provides a large collection of operation types
 The following are the principal categories:
 Load and store instructions: only load and store instructions access
memory
locations; arithmetic and logical instructions are performed only
on registers and immediate values encoded in the instruction
 A Branch instructions: Allows a conditional branch forwards or
backwards up to
32 MB.
 Data-processing instructions:includes logical instructions (AND,
OR, XOR), add and subtract instructions, and test and compare
instructions
 Multiply instructions: The integer multiply instructions
operate on word or halfword operands and can produce normal
BITS Pilani, Deemed to be University under Section 3 of UGC
or long results
ARM Operation
Types contnd.
 Parallel addition and subtraction instructions

 Extend instructions: There are several instructions for

unpacking data

by sign or zero extending bytes to halfwords or words, and

halfwords to words.

 Status register access instructions: ARM provides the ability

to read and

also to write portions of the status register.

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM Endian Support
E-bit in system control
register Under program
control

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Using
Registers
 Registers are faster
 Shorter instructions
– The number of registers is smaller (e.g. 32 registers
need 5 bits)
 Potential speedup
Minimize the frequency with which data is moved back
and forth between the memory and processor registers.

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM Condition
Codes
Condition code flags
Condition code register / status
register N (negative)
Z (zero)
V
(overflow)
C (carry)
Different
instruction
s affect
different
flags BITS Pilani, Deemed to be University under Section 3 of UGC
Conditional Branch
Instructions

Example: A: 11110000
▪A: 1 1 1 1 0 0
00
▪B: 0 0 0 1 0 1
+(−B): 1 1 1 0 1 1 0
00 0
11011100

C=1 Z=0
S=1
V=0
Overflow can only happen when adding two numbers of the same sign
and getting a different sign. So, to detect overflow we don't care about
any bits except the sign bits. Ignore the other bits.

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM Conditions for Conditional Instruction
Execution

BITS Pilani, Deemed to be University under Section 3 of UGC

Expression Evaluation
 Infix notation: A binary operator appears between the operands
 𝑎 + (𝑏 × 𝑐), will yield a different result from (𝑎 + 𝑏) × 𝑐
 Reverse Polish, or postfix,
notation a + b becomes a
b+

𝑎 + (𝑏 × 𝑐),
becomes a b c x +

(𝑎 + 𝑏) × 𝑐 becomes
ab+cx

 Regardless of the complexity of an expression, no parentheses are

required when using reverse Polish
 An expression in this form is easily evaluated using a stack
 Rules
1. If the element is a variable or constant, push it onto the stack.
BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats
TOS= TOS (operation)
(TOS-1)
Example: Evaluate (A+B) 
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B B
2. PUSH ; TOS ← (A + B) A

B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction Formats

Example: Evaluate (A+B) 

(C+D) Zero-Address
1. PUSH ; TOS ← A D
A ; TOS ← B C
2. PUSH ; TOS ← (A + B)
A+B

B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
Instruction
Formats
Example: Evaluate (A+B) 
(C+D) Zero-Address
1. PUSH ; TOS ← A
A ; TOS ← B
2. PUSH ; TOS ← (A + B) (C+D)*(A+B)
B ; TOS ← C
3. ADD ; TOS ← D
4. PUSH ; TOS ← (C + D)
C ; TOS ←
5. PUSH (C+D)(A+B)
D ; M[X] ← TOS
6. ADD
7. MUL BITS Pilani, Deemed to be University under Section 3 of UGC
BITS
Pilani
Pilani | Dubai | Goa |
Hyderabad

Already included this in 8086 , students have to learn this on

their own

Addressing Modes
BITS Pilani, Deemed to be University under Section 3 of UGC
• 8086 Intel Microprocessor
Architecture

BITS Pilani, Deemed to be University under Section 3 of UGC

Generating Memory
Addresses
How to specify the
address of branch
target?
Can we give the memory operand address
directly in a single Add instruction in the
loop?

Use a register to hold the address of

NUM1; then increment by 4 on each pass
through the loop.

BITS Pilani, Deemed to be University under Section 3 of UGC

Addressing
Modes
Opcode Mode ...

Implied
▪AC is implied in “ADDM[AR]” in “One-
Address” instr.
▪TOS is implied in “ADD” in “Zero-
Address” instr. Immediate
▪The use of a constant in R1, 5”, i.e. R1
“MOV 5 ←
Register
▪Indicate which register
holds the operand
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes

Register Indirect
▪Indicate the register that holds the number of the register
that holds
the
MOVoperand
R1,
Autoincrement
(R2) / Autodecrement R1
▪Access & update in 1
instr. Direct Address R2 = 3
▪Use the given address
to access a memory R3 = 5
location

BITS Pilani, Deemed to be University under Section 3 of UGC

Addressing
Modes
Indirect Address
▪Indicate the memory location that holds the
address of the memory location that holds the
data

AR = 101

100
101 0 1 0 4
102
103
104 1 1 0 A

BITS Pilani, Deemed to be University under Section 3 of UGC

Addressing
Modes
Relative Address
▪EA = PC + Relative 0
Addr 1
PC = 2 2

+
100
AR = 100
101
Could be 102 1 1 0
Positive or A
Negative 103
(2’s 104
Complemen
t)
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Indexe
d
▪EA = Index Register + Relative
Addr
Useful with
“Autoincrement” XR = 2
or
“Autodecrement” +
100
AR = 100
101
102 1 1 0
Could be A
Positive or 103
Negative 104
(2’s
Complemen
BITS Pilani, Deemed to be University under Section 3 of UGC
Addressing
Modes
Base Register
▪EA = Base Register + Relative
Addr

Could be Positive or AR = 2
Negative
(2’s Complement)
+

100 0 0 0 5
BR = 100
101 0 0 1 2
102 0 0 0 A
Usually points to 103 0 1 0 7
the beginning of 104 0 0 5 9
an array

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM Indexing
Methods
RISC machine, unlike a CISC
machine has simple and relatively
straightforward set of addressing
modes
ARM architecture departs somewhat
from this tradition by providing a
relatively rich set of addressing
modes.

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM Addressing Modes
Load/Store
 Only instructions that reference memory

 Indirectly through base register plus offset

 Offset

 Offset added to or subtracted from base register

contents to form the memory address

 Preindex

 Memory address is formed as for offset addressing

 Memory address also written back to base register

 So base register value incremented or decremented by

offset value BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Addressing Modes
Load/Store
 Postindex

 Memory address is base register value

 Offset added or subtracted

Result written back to base register

 Base register acts as index register for preindex and postindex

addressing

 Offset either immediate value in instruction or another register

 If register scaled register addressing available

 Offset register value scaled by shift operator

 Instruction specifiesBITS
shift size
Pilani, Deemed to be University under Section 3 of UGC
ARM Data Processing Instruction
Addressing & Branch
Instructions
 Data Processing

 Register addressing

– Value in register operands may be scaled using a

shift operator

 Or mixture of register and immediate addressing

 Branch

 Immediate

 Instruction contains 24 bit value

 Shifted 2 bits left

– On word boundary
– Effective range +/-32MB from PC.
BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Load/Store
Multiple Addressing
 Load/store subset of general-purpose registers

 16-bit instruction field specifies list of registers

 Sequential range of memory addresses

 Increment after, increment before, decrement after, and

decrement

before

 Base register specifies main memory address

 Incrementing or decrementing starts before or after

first memory access

BITS Pilani, Deemed to be University under Section 3 of UGC
ARM Load/Store
Multiple Addressing

BITS Pilani, Deemed to be University under Section 3 of UGC

Instruction Formats
 Layout of bits in an instruction
 Includes opcode
 Includes (implicit or explicit) operand(s)
 Usually more than one instruction format in an instruction
set

Instruction Length
 Affected by and affects:
 Memory size
 Memory organization
 Bus structure
 CPU complexity
 CPU speed
 Trade off between powerful instruction repertoire and
saving space BITS Pilani, Deemed to be University under Section 3 of UGC
Allocation
ofaddressing
 Number of Bits
modes
 Number of operands
 Register versus memory
 Number of register sets
 Address range
 Address granularity

BITS Pilani, Deemed to be University under Section 3 of UGC

In class Example-1
Assume an instruction set that uses a fixed 16-bit instruction length. Operand
specifiers are 6 bits in length. There are K two-operand instructions and L zero-
operand instructions. What is the maximum number of one-operand instructions?

BITS Pilani, Deemed to be University under Section 3 of UGC

In class Example-1
Design a variable-length opcode to allow all of the following to be encoded
in a 36-bit instruction:
• instructions with two 15-bit addresses and one 3-bit register number
• instructions with one 15-bit address and one 3-bit register number
• instructions with no addresses or registers

BITS Pilani, Deemed to be University under Section 3 of UGC

ARM
Instruction
Formats

S = For data processing instructions, updates condition codes

S = For load/store multiple instructions, execution restricted to
supervisor mode P, U, W = distinguish between different types of
addressing_mode
B = Unsigned byte (B==1) or word (B==0) access
L = For load/store instructions, Load (L==1) or Store (L==0)
BITS Pilani, Deemed to be University under Section 3 of UGC
L = For branch instructions, is return address stored in link register
Examples of Use of ARM Immediate
Contants

BITS Pilani, Deemed to be University under Section 3 of UGC

Expanding a Thumb ADD Instruction into its
ARM Equivalent

BITS Pilani, Deemed to be University under Section 3 of UGC

BITS Pilani
Pilani | Dubai | Goa | Hyderabad

CPU simulator as example

BITS Pilani, Deemed to be University under Section 3 of UGC
How to access your Virtual lab
portal?

1. Login into your BITS WILP Elearn portal (https://elearn.bits-

pilani.ac.in/)
2. Click on "My Virtual Labs" under Important Links

Can also be downloaded from the below link and installed on your
PC, it is a free tool

Downloads – CPU-OS Simulator (teach-sim.com)

MSI Installer version for use by Windows Installer in silent mode.
Used to
distribute and install across several
systems. CPU-OS Simulator
7-5-50-msi.zip

BITS Pilani, Deemed to be University under Section 3 of UGC

Example
s
1.locate the instruction, which is used to store one byte of data in a
memory location. Use it to store number 65 in address location 20 (all
numbers are in decimal). This is an example of direct addressing.
STB #65,20
2.Create an instruction to move decimal number 22 to register R01 and
make a note of it below. Execute this instruction and verify the result in
R01.
MOV #22,R01
3.Create an instruction to store decimal number 51 in
memory location the address of which is currently stored in register R01.
This is an example of indirect addressing. Note the use of the “@” prefix
next to R01 in this case.

STB #51,@R01

4. Make a note of what you see in data memory locations 20 and 22

BITS Pilani, Deemed to be University under Section 3 of UGC
3.Create an instruction to store decimal number
51 in memory location the address of which is
currently stored in register R01. This is an
example of indirect addressing. Note the use of
the “@” prefix next to R01 in this case.

STB #51,@R01

4.Make a note of what you see in data

memory locations 20 and 22

BITS Pilani, Deemed to be University under Section 3 of UGC

Creating A Loop
Now, let’s create a loop. First, enter the following code.
The # prefix is used to denote a literal value thus
distinguishing it from an address value which does not
use it.
MOV #0, R01
ADD #1, R01
CMP #5, R01
JNE 0
HLT

in order to make the code more flexible we can use

labels to represent instruction addresses

BITS Pilani, Deemed to be University under Section 3 of UGC

Naming the LOOP

Type label name L0 in the box next to the ENTER

LABEL
button in the window you use to enter instructions

Click the ENTER LABEL button

The new code should now look like this
(modifications are in red colour):

MOV #0, R01

L0:
ADD #1, R01
CMP #5, R01
JNE $L0
HLT
BITS Pilani, Deemed to be University under Section 3 of UGC
Elements of a machine instruction

 Opcode
 Operand
 Direct
 operand references
 Eg: Add A 87H
 or Add A B
 Next Instruction
Instruction Set Design Issue
 Type of Operation ?
 Kind of Data ?
 Instruction Format ?
 Number of internal Registers in
CPU ?
 Addressing Modes?

BITS Pilani, Deemed to be University under Section 3 of UGC

Questions on simulator

 Type of Operation ?
 Kind of Data ?
 Instruction Format ?
 Number of internal Registers in
CPU ?
 Addressing Modes?

BITS Pilani, Deemed to be University under Section 3 of UGC

Assembler
 Machines store and understand binary
instructions
 E.g. N= I + J + K initialize I=2, J=3, K=4
 Program starts in location 101
 Data starting 201
 Code:
 Load contents of 201 into AC
 Add contents of 202 to AC
 Add contents of 203 to AC
 Store contents of AC to 204
 Tedious and error prone

BITS Pilani, Deemed to be University under Section 3 of UGC

Improvements
 Use hexadecimal rather than binary
 Code as series of lines
 Hex address and memory address
 Need to translate automatically using
program
 Add symbolic names or mnemonics for
instructions
 Three fields per line
 Location address
 Three letter opcode
 If memory reference: address
 Need more complex translation program

BITS Pilani, Deemed to be University under Section 3 of UGC

Program
in: Binary Hexadecim
al
Address Contents Address Contents

101 0010 0010 101 2201 101 2201

102 0001 0010 102 1202 102 1202

103 0001 0010 103 1203 103 1203

104 0011 0010 104 3204 104 3204

201 0000 0000 201 0002 201 0002

202 0000 0000 202 0003 202 0003

203 0000 0000 203 0004 203 0004

204 0000 0000 204 0000 204 0000

BITS Pilani, Deemed to be University under Section 3 of UGC

Symbolic
Addresses
 First field (address) now symbolic

 Memory references in third field now symbolic

 Now have assembly language and need an assembler to

translate

 Assembler used for some systems programming

▪Compliers
▪I/O routines

BITS Pilani, Deemed to be University under Section 3 of UGC

Computation of the
Formula N = I + J + K

BITS Pilani, Deemed to be University under Section 3 of UGC

Adressing Mode In class
example
Given 3values and a one-address machine with
the following memory
an accumulator, what values do the following instructions load into
the accumulator?
• Word 20 contains 40.
• Word 30 contains 50.
• Word 40 contains 60.
• Word 50 contains 70.
a. LOAD IMMEDIATE 20
b. LOAD DIRECT 20
c. LOAD INDIRECT 20
d. LOAD IMMEDIATE 30
e. LOAD DIRECT 30
f. LOAD INDIRECT 30

a.
Sol:2
0 d. 30 e. 50 f .
b. 4 70
0
c. 6
0
BITS Pilani, Deemed to be University under Section 3 of UGC
In class example 4
Consider a 16-bit processor in which the following appears in main memory, starting at
location 200:
The first part of the first word indicates
that this instruction loads a value into
an accumulator.
The Mode field specifies an addressing mode and, if appropriate, indicates a source register;
assume that when used, the source register is R1, which has a value of 400.There is also a
base register that contains the value 100.The value of 500 in location 201 may be part of the
address calculation. Assume that location 399 contains the value 999, location 400 contains
the value 1000, and so on. Determine the effective address and the operand to be loaded for
the following address modes:

BITS Pilani, Deemed to be University under Section 3 of UGC

Contnd.
Determine the effective address and the operand to be
loaded for the following address modes

a. Direct d. PC relative g. Register indirect

b. Immediate e. Displacement,
c. Indirect f. Register
h. Autoindexing with increment using R1

Mode→ R1 and contains A 500 1100

B 201 500
400 Base register 100
C 1100 1700
Adrs 399 → 999 & 400→ D 702 1302
1000 E 600 1200
F R1 400
G 400 1000
H 400 1000

BITS Pilani, Deemed to be University under Section 3 of UGC

In class example 5
Assume a stack-oriented processor that includes the stack operations PUSH
and POP. Arithmetic operations automatically involve the top one or two
stack elements. Begin with an empty stack.What stack elements remain
after the following instructions
are
executed?
PUSH 7 Push 4 4
Push 7 7,4
PUSH 8
4 Push 8 8,7,4
ADD ADD 15,4
PUSH 10,15,
PUSH 10 SUB 4
10 SUB MUL 5,4
20
MUL

BITS Pilani, Deemed to be University under Section 3 of UGC

In class example 6
Compare zero-, one-, two-, and three-address machines by writing
programs to compute
for each of the four machines.The instructions available for use are
as follows:
X = (A + B X C) ∕ (D - E X F)
0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M

BITS Pilani, Deemed to be University under Section 3 of UGC

0 Address 1 Address 2 Address 3 Address
PUSH LOAD M MOVE(x←y ) MOVE (x←y )
M POP STORE M ADD ADD (x←y
M ADD ADD (x←x+y ) +z) SUB
SUB M SUB (x←x-y ) (x←y -z) MUL
MUL SUB M MUL (x←x X y (x←y X Z)
DIV MUL ) DIV DIV
M (x←x/y ) (x←y/z )
DIV M

C
B
A

X = (A + B X C) ∕ (D - E
X F)

BITS Pilani, Deemed to be University under Section 3 of UGC

BXC
A

X = (A + B X C) ∕ (D - E
X F)

BITS Pilani, Deemed to be University under Section 3 of UGC

X = (A + B X C) ∕ (D - E
X F)

BITS Pilani, Deemed to be University under Section 3 of UGC

ExF
D
A+BXC

X = (A + B X C) ∕ (D - E
X F)

BITS Pilani, Deemed to be University under Section 3 of UGC

D-ExF
A+BXC

X = (A + B X C) ∕ (D - E
X F)

BITS Pilani, Deemed to be University under Section 3 of UGC

A+BxC /
D-ExF

X = (A + B X C) ∕ (D - E
X F)

BITS Pilani, Deemed to be University under Section 3 of UGC

BITS
Pilani
Pilani Campus

DSRao
COA -IS ZC353
Memory
Characteristics of memory
systems
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical
characteristics

BITS Pilani, Pilani

The Big Picture: Where are We
Now?
• The Five Classic Components of a Computer
• Memory is usually implemented as:
– Dynamic Random Access Memory (DRAM) - for main
memory
– Static Random Access Memory (SRAM) - for cache
Processor
Input
Control
Memory

Datapath Output

BITS Pilani, Deemed to be University under Section 3 of UGC

Technology
Trends

Capacity Speed
(latency)
Logic: 2x in 3 2x in 3 years
years
DRAM: 4x in 3 2x in 10 years
years
Disk: 4x in 3DRAM 2x in 10 years
years
Year Size Cycle Time
1980 1000:1! 64 Kb 2:1! 250 ns
1983 256 Kb 220 ns
1986 1 Mb 190 ns
1989 4 Mb 165 ns
1992 16 Mb 145 ns
1995 64 Mb 120 ns
1998 256 Mb 100 ns
2001 1 Gb 80 ns

BITS Pilani, Deemed to be University under Section 3 of UGC

Who Cares About
Memory?
Processor-DRAM Memory Gap (latency)

1000 µProc CPU

60%/yr.
“Moore’s Law”
(2X/1.5yr)
Performance

100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM
9%/yr.
1 (2X/10 yrs)
1980
1981
1982
1983
1984
1985

2000
1986
1987

1989
1990
1991
1992
1993
1994
1995

1997
1998
1999
1988

1996
Time
BITS Pilani, Deemed to be University under Section 3 of UGC
Locatio
n
• Internal (main)
– CPU : In the form of registers, control unit
memory
– Cache , main memory
• External(secondary) CONTROL UNIT

– Disk andtapes Sequencing

logic
Control unit
registers
and
decoders
CPU

Control
ALU memory
Registers

CPU
Internal CPU
Interconnection
(Bus)

Control unit

BITS Pilani, Pilani

Capacit
y
• Number of bytes or words
• Word size
– The natural unit of organisation
• Common word lengths are 8,16,
and 32 bits

BITS Pilani, Pilani

Unit of
Transfer
• Internal
– Usually governed by data bus width
• Three related concepts for internal memory
– Word : The size of the word is typically equal to the
number of bits
used to represent an integer and to the instruction length.
» Exceptions:
• CRAY C90 has a 64-bit word length but uses a 46-bit integer
representation.
• The Intel x86 architecture has a wide variety of instruction
lengths, expressed as multiples of bytes, and a word size of 32
bits.
– Addressable units (Address space) : 2A = N
» Addressability : How many bits per location?
» Total Size= Addressability * Addressable unit
– Unit of transfer : Word or block
• External
– Usually a block which is much larger than a word
BITS Pilani, Pilani
Access Methods
(1)
• Sequential
– Start at the beginning and read through in
order
– Access time depends on location of data and
previous location
– e.g. tape

• Direct
– Individual blocks have unique address
– Access is by jumping to vicinity plus sequential
search
– Access time is variable i.e. access time
depends on location and previous location
BITS Pilani, Pilani
Access Methods
(2)
Random
– Addressable locations are identified by individual
addresses
– Access time is independent of location or previous
access
– e.g. RAM

Associative
– A word is retrieved based on a portion of its
content rather than its address.
– Access time is independent of location or previous
access
– e.g. cache
BITS Pilani, Pilani
Performanc
e
Access time or Latency
– Time between presenting the address and
getting the valid data
Memory Cycle time
– Memory Cycle time = access time +
Additional time
– Additional time may be required for transients
to die out on signal lines or to generate data if
they are read destructively
Transfer Rate
– Rate at which data can be transferred into
or out of a memory unit

BITS Pilani, Pilani

Physical
Types
• Semiconduct
or
– RAM
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram

BITS Pilani, Pilani

Physical
Characteristics
• Volatile vs Non-
volatile
• ROM vs RAM

BITS Pilani, Pilani

The Bottom
Line
• How much?
– Capacity
• How fast?
– Time is money
• How
expensive?

BITS Pilani, Pilani

The Bot om Line
….
• Faster access time , cost per
bit
• Greater capacity, cost per bit
• Greater capacity, access
time

BITS Pilani, Pilani

The Bot om Line
….
• Faster access time , greater cost
per bit
• Greater capacity, smaller cost per
bit
• Greater capacity, slower access
time

BITS Pilani, Pilani

Memory
Hierarchy

Memory technology Typical access time $ per GB in 2004

SRAM 0.5–5 ns $4000–$10,000
DRAM 50–70 ns $100–$200
Magnetic disk 5,000,000–20,000,000 ns $0.50–$2

CPU

Processor

Increasing
Level
distance from
1
the CPU in
access time
Levels in the Level 2
memory
hierarchy Data are
transferred

Level
n

Size of the memory at each

level

BITS Pilani, Deemed to be University under Section 3 of UGC

Hierarchy
List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Disk cache
• Disk
• Optical
• Tape

down the hierarchy,

• Increasing capacity
• Decreasing cost per bit
• Increasing access time
• Decreasing frequency of access
of the memory by the processor
BITS Pilani, Pilani
Memory Hierarchy: How Does it
Work?
Temporal Locality (Locality in Time):
Keep most recently accessed data items closer to the
processor
 e.g., instructions in a loop
Spatial Locality (Locality in
Space):
Move blocks consists of
contiguous words to the upper
levels Lower Level
To instruction Upper Level
e.g., sequential Memory
Processor
access, array data Memory
Blk X
From Processor Blk Y

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory Hierarchy:
Terminology
 Hit: data appears in some block in the upper level (example: Block X)
 Hit Rate: the fraction of memory access found in the upper level
 Hit Time: Time to access the upper level which
consists of RAM access time + Time to
determine hit/miss
 Miss: data needs to be retrieve from a block in the
lower level (Block Y)
 Miss Rate = 1 - (Hit Rate)
 Miss Penalty: Time to replace a block in the upper level + Time
to deliver the block the processor
Hit Time << Miss Penalty Lower Level
To Upper Level Memory
Processor Memory
Blk X
From Processor Blk Y

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory Hierarchy of a Modern Computer
System
By taking advantage of the principle of locality:
▪Present the user with as much memory as is available in the cheapest
technology.
▪Provide access at the speed offered by the fastest technology.

Processor

Control
Secondary Tertiar
Storage y
Second Main (Disk) Storage
Level Memory (Tape)
Registers

Datapath On-
Cache
Chip
Cache
(SRAM) (DRAM)

Speed (ns): 1s 10s 100s 10,000,000s 10,000,000,000s

(10s ms) (10s sec)
Size (bytes): 100s Ks Ms Gs Ts

BITS Pilani, Deemed to be University under Section 3 of UGC

General Principles of
Memory
 Locality
Temporal Locality: referenced memory is likely to be referenced again soon (e.g. code within a loop)
Spatial Locality: memory close to referenced memory is likely to be referenced soon (e.g., data in a
sequentially access array)
 Definitions
Upper: memory closer to processor
Block: minimum unit that is present or not present
Block address: location of block in memory
Hit: Data is found in the desired location
Hit time: time to access upper level
Miss rate: percentage of time item not
found in upper level
 Locality + smaller HW is faster = memory
hierarchy
Levels: each smaller, faster, more
expensive/byte than level below BITS Pilani, Deemed to be University under Section 3 of UGC
In class Example
1
Suppose that the processor has access to two levels of memory. Level 1 contains
1000 words and has an access time of 0.01µ s; level 2 contains 100,000 words and
has an access time of 0.1µ s. Assume that if a word to be accessed is in level 1,
then the processor accesses it directly. If it is in level 2, then the word is first
transferred to level 1 and then accessed by the processor.
For simplicity, we ignore the time required for the processor to determine whether
the word is in level 1 or level 2.
Figure next shows the general shape of the curve that covers this situation. The
figure shows the average access time to a two-level memory as a function of the
hit ratio H, where H is defined as the fraction of all memory accesses that are
found in the faster memory (e.g., the cache), T1 is the access time to level 1, and
T2 is the access time to level 2.
As can be seen, for high percentages of level 1 access, the average total access
time is much closer to that of level 1 than that of level 2

BITS Pilani, Deemed to be University under Section 3 of UGC

In class example2
In our example, suppose 95% of the
memory accesses are found in the
cache. Then the average time to access
a word can be expressed as
(0.95)(0.01 µs) + (0.05)(0.01 µs + 0.1
µs)
= 0.0095 + 0.0055 = 0.015 µs
The average access time is much
closer
to 0.01µ s than to 0.1 µs, as desired

Performance of accesses
involving only level 1 (hit ratio)

BITS Pilani, Deemed to be University under Section 3 of UGC

Cach
e
Small amount of fast memory
Sits between normal main memory and
CPU May be located on CPU chip or
module

BITS Pilani, Deemed to be University under Section 3 of UGC

Cache operation - overview
 CPU requests contents of memory
location
Cache Design
 Check cache for this data  Size

 If present, get from cache (fast)  Mapping Function

 Replacement
 If not present, read required block
Algorithm
from main memory to cache  Write Policy
 Then deliver from cache to CPU
 Block Size
 Cache includes tags to identify
 Number of Caches
which block of main memory is in
each cache slot

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Cache/Main Memory
Structure

BITS Pilani, Pilani

Cache Read Operation -
Flowchart
Cache Misses - Read
• The control unit
1. must detect a miss
2. Stall the entire
processor
3. fetch the requested
data
from memory
• Steps taken by the control
unit on an instruction cache
miss
1. Send PC-4 to memory
2. Start reading the
main memory
3. Transferring block to
the cache
4. Restart the
instruction to start
fetching from the
BITS Pilani, Pilani
Read
Hit

BITS Pilani, Pilani

Read
Miss

Miss Penalty BITS Pilani, Pilani

Cache Misses -
Writes
 On an instruction cache miss, we
 Index the cache using bits 15-2 of the address
 Write the cache entry
 Data from processor is placed in data portion
 Bits 31-16 of address written into tag field
 Turn valid bit on
 Write the word to main memory using the entire address
 How do we keep main memory up to date ?
 Write-through means writing to both cache and main
memory when a miss occurs (to avoid inconsistent or
untrue memories)
 Write buffer uses a fast and small memory to store the
cache writes while it is waiting to be written out to
memory (in MIPS, it is 4 words)
 Write-back means writing out to main memory only when
the block is swapped out
BITS Pilani, Pilani
Write
hit

BITS Pilani, Pilani

Write
miss

BITS Pilani, Pilani

Memory Organizations (1)
 In part a, all components are one word wide
 In part b, a wider memory, bus and cache are utilized
 In part c, interleaved memory banks with a narrow bus
and cache are utilized

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory Organizations
(2)
 Assume that it takes

▪ 1 clock cycle to send the referenced address

▪ 15 clock cycles for each DRAM access initiated

▪ 1 clock cycle to send a word of data

 In part a, we have a 1 + (4 x 15) + (4 x 1) = 65 clock cycle miss

penalty, and a

(4x4)/65 = 0.25 bytes per clock cycle

 In part b, we have a 1 + (2 x 15) + (2 x 1) = 33 clock cycle miss

penalty, and a

(4x4)/33 = 0.48 bytes per clock cycle (assuming memory width of

two words)

▪ Wider bus and higher cache access time

 In part c, we have a 1 + (1x15) + (4 x 1) = 20 clock cycle miss

penalty, and a BITS Pilani, Deemed to be University under Section 3 of UGC
Four Questions for Memory Hierarchy
Designers

Q1: Where can a block be placed in the upper level?

(Block placement)

Q2: How is a block found if it is in the upper level?

(Block identification)

Q3: Which block should be replaced on a miss?

(Block replacement)

Q4: What happens on a write?

(Write strategy)

BITS Pilani, Deemed to be University under Section 3 of UGC

Q1: Where can a block be
placed?
 Direct Mapped: Each block has only one place
that it can appear in the cache.
 Fully associative: Each block can be placed
anywhere in
the cache.
 Set associative: Each block can be placed in a
restricted set of places in the cache.
If there are n blocks in a set, the cache placement
is called n-way set associative
What is the associativity of a direct mapped cache?

BITS Pilani, Deemed to be University under Section 3 of UGC

Locating data in the
Cache 31 30
0
13 12 11 21

Index is 10 bits, while tag is 20 e

Byt

offse
1
bits Hi
t
Ta
2
0 0
t
Dat
a
g
▪We need to address 1024 (210) words Inde
x

▪We could have any of 220 words per Index Valid Dat
a
Tag 0
cache 1
2
location

Valid bit indicates whether an

102
entry contains a valid 1
102

address or not Tag bits is 2

102
2
0
3
2
3

usually indicated by
address size – (log2(memory size) +
2)
▪E.g. 32 – (10 + 2) = 20
BITS Pilani, Deemed to be University under Section 3 of UGC
Elements of Cache
Design
 Cache Addresses
 Logical (also known as a virtual
cache)
 Physical  Write Policy
 Cache Size
 Write through
 Mapping Function
 Write back
 Direct
 Write once
 Associative
 Line Size
 Set Associative
 Replacement Algorithm  Number of caches
 Least recently used (LRU)  Single or two level
 First in first out (FIFO)  Unified or split
 Least frequently used (LFU)
 Random
BITS Pilani, Deemed to be University under Section 3 of UGC
Size does
matter
Cost
• Small enough so that the overall average cost per bit is
close to that of main memory alone.
Speed
• Large enough so that the overall average access time
is close to
that of cache alone.

BITS Pilani, Pilani

Direct mapped cache:
Example Main Memory
• 16 Bytes byte
addressable main 0 0 00
memory 0 0 01
• How many 0 0 10
address bits are 0 0 11
required?
0 1 00
• Memory
– How block
many size
cacheis 4 0 1 01
bytes
lines? 0 1 10
• Cache
eacof 8isByte
– cache 2 lines of 4 0 1 11
bytes
h
Cache Memory
1 0 00
1 0 01
Line 0 1 0 10
1 0 11
1 1 00
Line 1 1 1 01
1 1 10
1 1 11
BITS Pilani, Pilani
Cache Memory Main Memory
0 0 00
Line 0 0 0 01
Block 0
Line 1 0 0 10
0 0 11
0 1 00
0 1 01 Block 1
0 1 10
0 1 11
1 0 00
1 0 01
1 0 10 Block 2
1 0 11
1 1 00
1 1 01
1 1 10
1 1 11 Block 3
BITS Pilani, Pilani
Direct
Mapping
• Each block of main memory maps to only one
cache line
– i.e. if a block is in cache, it must be in one specific
place
– i = j modulo m
– where i = cache line number
– j = main memory block number
– m = number of lines in the cache
• Address is split in two parts
• Least Significant w bits identify unique word
• Most Significant s bits specify memory block
• The MSBs are split into a cache line field r and
a tag of
BITS Pilani, Pilani
Direct Mapping Cache
Organization

BITS Pilani, Pilani

Direct mapping-
Summary
• Address length = (s+w) bits
• Number of addressable units = 2s+w words or
bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w /
2w = 2s
• Number of lines in cache = m = 2r
• Size of tag =(s-r) bits

BITS Pilani, Pilani

Direct mapping
Summary

1 1 2

Tag s-r Line or Slot r Word w

BITS Pilani, Pilani

Measuring and Improving
Cache Performance

cache miss: A request for data from the cache that cannot be fulfilled
because the data is not present in the cache
CPU time =
(CPU execution clock cycles + Memory-stall clock cycles) × Clock
cycle time

Assume the miss rate of an instruction cache is 3% and

the miss rate of the data cache is 5%. If a processor
has a CPI of 3 without any memory stalls and the miss
penalty is 120 cycles for all misses, . Assume the
frequency of all loads and stores is 40%. Determine
How much faster a processor would run with a
perfect cache that never missed

BITS Pilani, Deemed to be University under Section 3 of UGC

Assume the miss rate of an instruction cache is 3% and
the miss rate of the data cache is 5%. If a processor
has a CPI of 3 without any memory stalls and the miss
penalty is 120 cycles for all misses, . Assume the
frequency of all loads and stores is 40%. Determine
How much faster a processor would run with a
perfect cache that never missed

Total Miss penalty = Ic X(0.03 X 0.6 X 120 + 0.05 X 0.4 X 120)

X 100
= (2.16+2.4)Ic=4.56Ic
For Perfect cache execution time= 3 X Ic = 3Ic
Total with imperfect= 3 X 0.92Ic + 4.56Ic = 2.76Ic + 4.56Ic =
7.32Ic

Speed up = 7.32/3= 2.44

BITS Pilani, Deemed to be University under Section 3 of UGC
Direct Mapped cache
Cache always smaller
(Block address) modulo (Number of blocks in the cache)

BITS Pilani, Deemed to be University under Section 3 of UGC

Exampl
e

• 32-word memory
Inde Vali Tag Data
• 8-word cache x d
000
• (The addresses 001
010
011
below are word 100
101
addresses.) 110
111
Address Binary Cache Hit or miss
block
22 10110 110
26 11010 010
22 10110 110
26 11010 010
16 10000 000
3 00011 011
16 10000 000
18 10010 010

BITS Pilani, Deemed to be University under Section 3 of UGC

Example:

BITS Pilani, Deemed to be University under Section 3 of UGC

BITS Pilani, Deemed to be University under Section 3 of UGC
BITS Pilani, Deemed to be University under Section 3 of UGC
How many total bits are required for a directed mapped

cache with four byte blocks, if the cache contains 64 Kbytes of data and the
address length of the

processor is 32 bits? Assume that byte addressing is used.

With four byte blocks, the 64 Kbyte cache contains a total of 64K/4 = 16K blocks. The
cache has the following parameters
(a) Byte select size: 2 bits (since 22 = 4 bytes/block)
(b) Cache index size: 14 bits (since 214 = 16K blocks)
(c) Cache tag size: 16 bits (remaining bits from 32)
(d) Block size: 32 bits (4 bytes)
(e)Number of blocks: 16K blocks (cache size/block size)
Assuming that the cache has a valid bit, a cache tag, and
data for each block, the size of the cache is:
cache bits = number of blocks x (block size + tag size + 1)
= 214 x (32 + 16 + 1)
= 802,816 bits
BITS Pilani, Deemed to be University under Section 3 of UGC
How many bits are required if the block size is 32
bytes? if the cache
contains 64 Kbytes of data and the address length of the processor is 32
bits

With 32 byte blocks, the 64 Kbyte cache contains a total of 64K/32 = 2K blocks.
The cache has the following parameters
(a) Byte select size: 5 bits (since 25 = 32 bytes/block)
(b) Cache index size: 11 bits (since 211 = 2K blocks)
(c) Cache tag size: 16 bits (remaining bits from 32)
(d) Block size: 256 bits (32 bytes)
(e) Number of blocks: 2K blocks (cache size/block size)
cache bits = number of blocks x (block size + tag size + 1)
= 211 x (256 + 16 + 1)
= 559,104 bits
Note: Increasing the block size tends to decrease the number of bits needed to implement
the cache
BITS Pilani, Deemed to be University under Section 3 of UGC
Miss rate versus block
size

BITS Pilani, Deemed to be University under Section 3 of UGC

Mapping an Address to a Multiword Cache Block
Consider a cache with 32 blocks and a block size of 8 bytes. To what
block number does byte address 600 map?

Explore?
𝐵𝑦𝑡𝑒
Address of the block is 𝐴𝑑𝑑𝑟𝑒𝑠𝑠
𝐵𝑦𝑡𝑒𝑠 𝑝𝑒𝑟
This block contains all 𝑏𝑙𝑜𝑐𝑘of 8 bytes
addresses

𝐵𝑦𝑡𝑒
𝐴𝑑𝑑𝑟𝑒𝑠𝑠 × 𝐵𝑦𝑡𝑒𝑠 𝑝𝑒𝑟
𝐵𝑦𝑡𝑒 𝑝𝑒𝑟
𝑏𝑙𝑜𝑐𝑘
60
= 75 𝑏𝑙𝑜𝑐𝑘
08 is the block address
This maps to (75 modulo 32) = 11
In fact this contains all address between 600 to 607
BITS Pilani, Deemed to be University under Section 3 of UGC
Direct mapping
Summary
Cache Line Main Memory blocks assigned

0 0,m,2m,…., 2s-m

1 1, m+1, 2m+1…..,2s -m+1

. .
. .
. .

m-1 m-1,2m-1,…., 2s-1

BITS Pilani, Pilani

Analyse Directly Mapped cache organization using different cache
size and
block size : Explain what cache hit and miss rates are
Write a new program in the CPU simulator and name it as call it
Cache understanding1
MOV #0, R02
STB R02, @R02
CMP #63, R02
JEQ 28
INR R02
JMP 6
HLT
Question: Can you think of another instruction in place of INR R02?

It can be ADD #01, R02.

JEQ 28 instruction?
BITS Pilani, Deemed to be University under Section 3 of UGC
Configure the cache
Block Size = 4
Cache Type = Direct Mapped
Cache Size = 16
Write Policy = Write-Back

Change Block size

BITS Pilani, Deemed to be University under Section 3 Act,

Change Block size =8
Insert Instruction LDB 9,
R08

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Reducing Cache Misses by More Flexible Placement of Blocks
Fully associative cache:
A cache structure in which a block can be placed in any location
in the
cache.
Set-associative cache:
A cache that has a fixed number of locations (at least two)
where each block can be placed.

A set-associative cache with n locations for a block is called an

n-way set- associative cache

BITS Pilani, Deemed to be University under Section 3 of UGC

Associative
Caches
Fully associative
▪Allow a given block to go in any cache
entry
▪Requires all entries to be searched at
once
▪Comparator per entry (expensive)
n-way set associative
▪Each set contains n entries
▪Block number determines which set
– (Block number) modulo (#Sets in
cache)
▪Search all entries in a given set at
once
▪n comparators (lessBITS
expensive)
Pilani, Deemed to be University under Section 3 of UGC
Associativity Examples
Cache size is 8 blocks
Where does word 12 from memory
go?

Fully associative:
Block 12 can go anywhere

Direct mapped:
Block no. = (Block address) mod
(No. of blocks in
cache)
Block 12 can go only into
block 4 (12 mod 8 = 4)
=> Access block using lower
3 bits

2-way set associative:

Set no. = (Block address)
mod
(No. of sets in
cache) Block 12 can go
BITS Pilani, Deemed to be University under Section 3 of UGC
anywhere in set 0 (12 mod 4
The location of a memory block whose
address is 12

Direct-mapped cache
(Block number) modulo (Number of blocks in the cache)

In direct-mapped placement, there is only one cache

block where memory block 12 can be found, and that
block is given by (12 modulo 8)  4.

Set-associative cache, set number

(Block number) modulo (Number of sets in the
cache)

In a two-way set-associative cache, there would be

four
sets, and memory block 12 must be in set (12 mod
4)  0;
BITS Pilani, Deemed to be University under Section 3 of UGC
The location of a memory block whose
address is 12

Set-associative cache, set number

(Block number) modulo (Number of sets in the
cache)

In a fully associative placement,

the memory block for block address 12 can
appear in any of the eight cache blocks.
BITS Pilani, Deemed to be University under Section 3 of UGC
A direct-mapped cache :a one-way set-associative cache:
A fully associative: cache with m entries is simply an m-way set-
associative cache; it
has one set with m blocks, and an entry can reside in any block within
that set BITS Pilani, Deemed to be University under Section 3 of UGC
BITS Pilani, Pilani
Replacement
Policy
• Direct mapped: no choice
• Set associative
– Prefer non-valid entry, if there is one

– Otherwise, choose among entries in the set

• Least-recently used (LRU)

– Choose the one unused for the longest time

• Simple for 2-way, manageable for 4-way, too hard

beyond that
• Random
– Gives approximately the same performance as LRU for
high
associativity
BITS Pilani, Deemed to be University under Section 3 of UGC
Associativity
Example
• Compare 4-block caches
Block Cache block
– Direct mapped, 2-way set Adrs
associative, fully associative 0 (0 modulo 4)=0
– Block access sequence: 0, 8, 0, 6 (6 modulo 4)=2
6, 8
8 (8 modulo 4)=0
• Direct mapped
Block Cach Hit/miss Cache content after access
addres e 0 1 2 3
s inde
x

0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
Assume there8 are three small
0 caches, each consisting
miss Mem[8]of four one-wordMem[6]
blocks. One cache is fully
associative, a second is two-way set-associative, and the third is direct-mapped. Find the number of
misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, and 8.

BITS Pilani, Deemed to be University under Section 3 of UGC

Associativity
Example
• 2-way set associative : Two way means there are two sets with indices 0
and 1 Cach
Block Hit/miss Cache content after access
addres e Set 0 Set 1
s inde
x

0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6] Block Cache Set
Adrs
loc
Fully associative k

0 (0 modulo 2)=0
 Set-associative caches usually replace the least recently
used b within a set 6 (6 modulo 2)=0
8 (8 modulo 2)=0
Block Hit/miss Cache content after access
addres
s
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0]
BITS Mem[8]
Pilani, Deemed to be University under Section 3 of UGC
What is when the block size is 8
For two way set Associative we have 2 Sets (0,1)

Memory Hit/ Contents of cache blocks after reference

Adr Miss Set0 Set0 Set0 Set0 Set1 Set1 Set1 Set1

0 M 0
8 M 0 8
0 H 0 8
6 M 0 8 6
8 H 0 8 6
there would be no replacements in the two-way set-associative
cache;

and it would have the same number of misses as the fully associative cache.
Check for yourself.: if we had 16 blocks, all 3 caches would have the same
number of misses BITS Pilani, Deemed to be University under Section 3 of UGC
Q2: How Is a Block
Found?
The address can be divided into two main
parts
▪Block offset: selects the data from the
block
offset size = log2(block
size)
▪Block address: tag + index
– index: selects set in cache
index size = log2(#blocks/associativity)
– tag: compared to tag in cache to
determine hit
tag size = addreess size -
index size - offset size
Each block has a valid bit that tells if the block is valid - the block is
in the cache if the tags match and the valid bit is set.

BITS Pilani, Deemed to be University under Section 3 of UGC

Q3: Which Block Should be Replaced on
a Miss?

Easy for Direct Mapped - only on choice

Set Associative or Fully Associative:
▪Random - easier to implement
▪Least Recently used - harder to
implement
Miss rates for caches with different
Associativit 2-way
size, associativity and replacemnt 4-way 8-way
y:
algorithm.
Size LRU Random LRU Rando LRU Random
m
16 KB 5.18 5.69% 4.67% 4.39%5.29%4.96%
%
64 KB 1.88 2.01% 1.54% 1.66% 1.39% 1.53%
%
256caches
For KB 1.15
with low 1.17% 1.13%
miss rates, 1.13%
random 1.12%
is almost 1.12%
as good
as LRU. %

BITS Pilani, Deemed to be University under Section 3 of UGC

Q4: What Happens on a
Write?
Write through: The information is written to both the block in the
cache and to the block in the lower-level memory.
Write back: The information is written only to the block in the cache.
The modified
cache block is written to main memory only when it is replaced.
▪is block clean or dirty? (add a dirty bit to
each block) Pros and Cons of each:
▪Write through
– Read misses cannot result in writes to
memory,
– Easier to implement
– Always combine with write buffers to avoid
memory latency
▪Write back
– Less memory trafficBITS Pilani, Deemed to be University under Section 3 of UGC
Q4: What Happens on a
Write?
Since data does not have to be
brought into the cache on a write
miss, there are two options:
▪Write allocate
– The block is brought into the cache on a write
miss
– Used with write-back caches
– Hope subsequent writes to the block hit in cache
▪No-write allocate
– The block is modified in memory, but not brought
into the cach
– Used with write-through caches
– Writes have to go to memory anyway, so why
bring the
block into the cache

BITS Pilani, Deemed to be University under Section 3 of UGC

To explore
To what cache set in Figure
does the word at address
0x00000014 map? Name
another address that maps to
the same set.

BITS Pilani, Deemed to be University under Section 3 of UGC

Solution
0x00000014 H
Representing only lower byte : 0001 0100
The two least significant bits of the address are 00, because the address is
word aligned.
So the alignment is 101 00
The next three bits are 101, so the word maps to set 5.

The other address are 0001 0100 to xxx1 0100

Or for that matter in Hex xxxxxx14 and in increment of 20H
Words at addresses 0x34, 0x54, 0x74, … , 0xFFFFFFF4 all map to this same
set.

BITS Pilani, Deemed to be University under Section 3 of UGC

Measuring Cache
Performance
CPU time = (CPU execution clock cycles +
Memory stall clock cycles)  Clock-
cycle time Memory stall clock cycles =
Read-stall cycles + Write-stall
cycles
Read-stall cycles = Reads/program 
Read miss rate 
Read miss penalty
Write-stall cycles = (Writes/program 
Write miss rate 
Write miss
penalty) + Write buffer stalls
(assumes write-through cache)
Write buffer stalls should be negligible and write and
read miss penalties equal (cost to fetch block from memory)
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory stall clock cycles = Mem access/program 
EXAMPLE:

Suppose we have a processor with a base CPI of 1.0, assuming all

references hit in the primary cache, and a clock rate of 4 GHz.
Assume a main memory access time of 100 ns, including all the miss
handling. Suppose the miss rate per instruction at the primary
cache is 2%.
How much faster will the processor be if we add a secondary cache
that has a 5 ns access time for either a hit or a miss and is large
enough to reduce the miss rate to main memory to 0.5%?

BITS Pilani, Deemed to be University under Section 3 of UGC

Solution
The miss penalty to main memory 0.25𝑛is = 100𝑛𝑠 = 400
𝑐𝑦𝑐𝑙𝑒𝑠 𝑠
The effective CPI with one level of caching is given
by Total CPI = Base CPI + Memory-stall cycles per
instruction

For the processor with one level of caching,

Total CPI = 1.0 + Memory-stall cycles per
instruction
= 1.0 + 2% × 400 = 9

= 20
5𝑛(or first-level)
With two levels of caching, a miss in the primary
𝑠
0.25𝑛
is = be satisfied either by the secondary𝑠 cache
𝑐𝑦𝑐𝑙𝑒𝑠
penalty for an access to the second-level
cache
cache can or by main
memory. The miss BITS Pilani, Deemed to be University under Section 3 of UGC
Thus, for a two-level cache, total CPI is the sum of the stall
cycles from both levels of cache and the base CPI:

Total CPI = 1 + Primary stalls per instruction + stalls

per
Secondary instruction
= 1 + 2% × 20 + 0.5% × 400
= 1 + 0.4 + 2.0 = 3.4

Thus, the processor with the secondary cache is

faster by 9.0/3.4= 2.6

BITS Pilani, Deemed to be University under Section 3 of UGC

Comparison of different cache mapping, Directly
Mapped cache, set associative and fully associative

First flush the contents of the cache by clicking on the

FLUSH button. Then enter the following instructions after
the last LDB instruction in the above program:
LDB 0, R00
LDB 4, R00
LDB 17, R00
Now select the option
LDB 20, R00 Block Size = 4
Cache Type = Set Associative
Cache Size = 16
Set Blocks = 2-way
Write Policy = Write-Back

BITS Pilani, Deemed to be University under Section 3 of UGC

How much of a reduction in the miss
rate is achieved by associativity?

for a 64 KB data cache with a 16-word

block

Going from one-way(direct) to two-way associativity decreases the

miss rate by about 15%, but there is little further improvement in
going to higher associativity.

BITS Pilani, Deemed to be University under Section 3 of UGC

Effect of associativity on
performance

15
%
1
12 KB
%
2
9 KB
%
4
6 KB
% 8
KB
3 16
% KB
32
64 128
KB
0 KB KB
One- Two-way Four- Eight-
way Associativi way way

BITS Pilani, Deemed to be University under Section 3 of UGC

Increasing associativity requires more comparators and
more tag bits
Assuming per cache
a cache block. a 16-byte block
of 4K blocks,
size, and a
32-bit address, find the total number of sets and the
total number of tag bits for caches that are required for
a direct Assuming a cache of 4K blocks, a 16-
. mapped, block size, and a 32-bit
Byte
b four-way
two-way set address
. associative,
and
c. and fully associative.
•Direct
d mapped cache:
• .16 bytes/block  28 bits for tag and index # sets = # blocks
• Log(4K) = 12 bits for index  16 bits for tag
• Total # of tag bits = 16 bits x 4K locations = 64 Kbits

Two-way set-associative cache:

32 bytes / set
16 bytes/block  28 bits for tag and index
# sets = # blocks / 2  2K sets
Log(2K) = 11 bits for index  17 bits for tag
Total # of tag bits = 17 bits x 2 location / set x 2K sets = 68 Kbits
BITS Pilani, Deemed to be University under Section 3 of UGC
Assuming a cache of 4K blocks, a 16-Byte block
size, and a 32-bit address
Sol:
Four-way set-associative cache:
64 bytes / set
16 bytes/block  28 bits for tag and index
# sets = # blocks / 4  1K sets
Log(1K) = 10 bits for index  18 bits for tag
Total # of tag bits = 18 bits x 4 location /
set x 1K sets = 72 Kbits

Fully associative cache:

1 set of 4 K blocks  28 bits for tag and index
Index = 0 bits  tag will have 28 bits
Total # of tag bits = 28 bits x 4K location / set x
1 set = 112 Kbits
BITS Pilani, Deemed to be University under Section 3 of UGC
Multilevel Cache
Considerations
Primary cache
▪Focus on minimal hit time
L-2 cache
▪Focus on low miss rate to avoid main memory
access
▪Hit time has less overall
impact Results
▪L-1 cache usually smaller
than a single cache
▪L-1 block size smaller than
L-2 block size

BITS Pilani, Deemed to be University under Section 3 of UGC

Reducing the Miss Penalty using Multilevel
Caches

 To further reduce the gap between fast clock rates of CPUs and the relatively long time
to access memory additional levels of cache are used (level two and level three
caches).
 The primary cache is optimized for a fast hit rate, which implies a relatively small
size
 A secondary cache is optimized to reduce the miss rate and penalty
needed to go to memory.
 Example:
 Assume CPI = 1 (with all hits) and 5 GHz clock
 100 ns main memory access time
 2% miss rate for primary cache
 Secondary cache with 5 ns access time and miss rate of .5%
 What is the total CPI with and without secondary cache?
BITS Pilani, Deemed to be University under Section 3 of UGC
Reducing the Miss Penalty using
Multilevel Caches
 The miss penalty to main
memory: 100 ns / .2 ns per
cycle = 500 cycles

 For the processor with only L1

cache:
Total CPI = 1 + 2% x 500 = 11

 The miss penalty to access L2

cache: 5 ns / .2 ns per cycle =
25 cycles

 If the miss is satisfied by L2

cache, then this is the only miss
penalty.

 If the miss has to be resolved by the main memory, then the

total miss penalty is the sum of both

 For the processor with both L1 and L2

BITS Pilani, Deemed to be University under Section 3 of UGC
caches: Total CPI = 1 + 2% x 25 + 0.5%
Memory Hierarchy
Framework
Three Cs used to model our memory
hierarchy
▪ Compulsory misses
– Cold-start misses caused by the first access to a block
– Solution is to increase the block size
▪ Capacity misses
– Caused when the cache is full and block needs to be
replaced
– Solution is to enlarge the cache
▪ Conflict misses
– Collision misses caused when multiple blocks compete
for the
same set, in the case of set-associative and fully-
associative
BITS Pilani, Deemed to be University under Section 3 of UGC
Design
Tradeoffs
As in everything in engineering, multiple design tradeoffs
exist when discussing memory hierarchies
There are many more factors involved, but the presented ones are
the most important and accessible ones

Change Effect on miss rate Negative effect

Increase size Decreases capacity misses May increase
access time
Increase Decreases miss rate due to May increase
associativity conflict misses access time
Increase ecreases miss rate for a wide May increase
block size range of block size miss penalty
BITS Pilani, Deemed to be University under Section 3 of UGC
Exampl
e
A computer system contains a main memory of 32K 16-bit
words. It has also a 4Kword cache divided into 4-line sets with
64 words per line. The processor fetches words from locations
0, 1, 2 , … , 4351 in that order sequentially 10 times. The
cache is 10 times faster than the main memory. Assume LRU
policy.

With no cache
Fetch time = (10 passes) (68 blocks/pass) (10T/block) =
6800T
With cache
Fetch time =(68)
(11T) first pass
other
+ (9) (48) (T) + (9) (20) passes
(11T) BITS Pilani, Deemed to be University under Section 3 of UGC
Modern
Systems

BITS Pilani, Deemed to be University under Section 3 of UGC

four-way set-
associative cache

BITS Pilani, Deemed to be University under Section 3 of UGC

Input/Output Module
Necessity
• Wide variety of peripherals
o Delivering different amounts of data (data size)
o At different speeds ,can be slower or faster than the
processor or memory
o In different formats
• All slower than CPU and RAM
• Need I/O modules
• Interface to CPU and Memory
• Interface to one or more peripherals

External Device
 An external device connects to the computer by a link to an I/O module

 This is called peripheral device or peripheral

 Link could be data line. Control line or status line

BITS Pilani, Deemed to be University under Section 3 of UGC

Generic Model of I/O
Module

BITS Pilani, Deemed to be University under Section 3 of UGC

External Devices
classification
• Human readable suitable for user
o Screen, printer, keyboard

• Machine readable suitable for machine

o Monitoring and control

• Communication suitable for remote devices

o Modem
o Network Interface Card (NIC)

External Device Block

Diagram

BITS Pilani, Deemed to be University under Section 3 of UGC

I/O Module
Function

•Control & Timing

•CPU Communication
•Device
Communication
•Data Buffering
•Error Detection

BITS Pilani, Deemed to be University under Section 3 of UGC

I/O module control Steps

•CPU checks I/O module device

status
•I/O module returns status
•If ready, CPU requests data
transfer
through a command to the I/O
module
•I/O module gets data from device
•I/O module transfers data to CPU
BITS Pilani, Deemed to be University under Section 3 of UGC
For all these there has to be communication between the processor and
the I/O module , thus
Processor communication involves
 Command decoding
 data exchange on the data bus
 Status reporting eg. BUSY, READY
Address recognition : I/O module
should recognize each peripheral that it
controls

BITS Pilani, Deemed to be University under Section 3 of UGC

I/O Module
Diagram

BITS Pilani, Deemed to be University under Section 3 of UGC

I/O Module
Decisions

• Hide or reveal device properties to CPU

• Support multiple or single device
• Control device functions or leave for CPU
• I/O channel or I/O processor takes
most of the processing burden

• I/O controller or device controller requires

detailed control from the processor

•Also O/S decisions

o e.g. Unix treats everything it can as a
BITS Pilani, Deemed to be University under Section 3 of UGC
Input Output
Techniques

⚫ Recall that the rate of transfer to and from I/O devices is

slower than the speed of the processor. This creates the need
for mechanisms to synchronize data transfers between them.
⚫ Program-controlled I/O:

⚫ Processor repeatedly monitors a status flag to achieve the

necessary synchronization.
⚫ Processor polls the I/O device.

⚫ Two other mechanisms used for synchronizing data transfers

between the processor and memory:
⚫ Interrupts.

⚫ Direct Memory Access

BITS Pilani, Deemed to be University under Section 3 of UGC
Three Techniques for Input of
a Block of Data

BITS Pilani, Deemed to be University under Section 3 of UGC

Programmed
I/O
• CPU has direct control over I/O
o Sensing status
o Read/write commands
o Transferring data
• CPU waits for I/O module to complete
operation
• Wastes CPU time
• CPU requests I/O operation
• I/O module performs operation
• I/O module sets status bits
• CPU checks status bits periodically
• I/O module does not inform CPU directly
• I/O module does not interrupt CPU
• CPU may wait or come back later
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O
Commands
• CPU issues address
o Identifies module (& device if >1 per module)
• CPU issues command
o Control - telling module what to do
 e.g. spin up disk
o Test - check status
 e.g. power? Error?
o Read/Write
 Module transfers data via buffer from/to device
 Addressing I/O Devices
• Under programmed I/O data transfer is very like
memory access
(CPU viewpoint)
• Each device given unique identifier
• CPU commands contain identifier (address)
BITS Pilani, Deemed to be University under Section 3 of UGC
I/O
Mapping

• Memory mapped I/O

oDevices and memory share an address space
oI/O looks just like memory read/write
oNo special commands for I/O
 Large selection of memory access commands
available

• Isolated I/O
oSeparate address spaces
oNeed I/O or memory select lines
oSpecial commands for I/O
 Limited set

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory Mapped and Isolated
I/O

BITS Pilani, Deemed to be University under Section 3 of UGC

Interrupt Driven
I/O

• Overcomes CPU waiting

• No repeated CPU checking of device
• I/O module interrupts when ready
 Basic Operation
• CPU issues read command
• I/O module gets data from peripheral whilst
CPU does other work
• I/O module interrupts CPU
• CPU requests data
• I/O module transfers data
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrup
ts
In program-controlled I/O, when the processor continuously
monitors the status of the device, it does not perform any
useful tasks.
An alternate approach would be for the I/O device to alert the
processor when it becomes ready.
▪Do so by sending a hardware signal called an interrupt to
the
processor.
▪At least one of the bus control lines, called an interrupt-
request line is dedicated for this purpose.
Processor can perform other useful tasks while it is waiting for
the
device to be ready. BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupt
s
(contd..) Program
1
1
Interrupt Service
routine

2
Interru
pt i
occurs
here i+
1

•Processor is executing the instruction located at address i when an

interrupt occurs.
•Routine executed in response to an interrupt request is called the
interrupt-service routine.
•When an interrupt occurs, control must be transferred to the interrupt
service routine.
•But before transferring control, the current contents of the PC (i+1),

must be saved in a known location.

BITS Pilani, Deemed to be University under Section 3 of UGC
•This will enable the return-from-interrupt instruction to resume
Interrupts (contd..)
⚫ Treatment of an interrupt-service routine is very similar to that of a
subroutine.
⚫ However there are significant differences:
⚫A subroutine performs a task that is required by the calling
program.
⚫Interrupt-service routine may not have anything in
common with the program it interrupts.
⚫Interrupt-service routine and the program that it interrupts
may belong to different users.
⚫As a result, before branching to the interrupt-service routine,
not only the PC, but other information such as condition code
flags, and processor registers used by both the interrupted
program and the interrupt service routine must be stored.
⚫This will enable the interrupted program to resume execution
BITS Pilani, Deemed to be University under Section 3 of UGC
Interrupts
(contd..)
⚫ Saving and restoring information can be done automatically by the
processor or explicitly by program instructions.
⚫ Saving and restoring registers involves memory transfers:
⚫Increases the total execution time.
⚫Increases the delay between the time an interrupt request is
received, and the start of execution of the interrupt-service
routine. This delay is called interrupt latency.
⚫ In order to reduce the interrupt latency, most processors save only
the minimal amount of information:
⚫This minimal amount of information includes Program
Counter and processor status registers.
⚫ Any additional information that must be saved, must be
saved explicitly by the
program instructions at the beginning of the interrupt service
BITS Pilani, Deemed to be University under Section 3 of UGC
routine.
Interrupts
(contd..)
When a processor receives an interrupt-request, it must branch to
the interrupt service routine.
It must also inform the device that it has recognized the
interrupt request. This can be accomplished in two ways:
▪Some processors have an explicit interrupt-acknowledge
control signal for this
purpose.
▪In other cases, the data transfer that takes place between the
device and the processor can be used to inform the device.

BITS Pilani, Deemed to be University under Section 3 of UGC

Simple Interrupt
Processing

CPU Viewpoint
•Issue read command
• Do other work
• Check for interrupt at
end of each
instruction cycle
• If interrupted:-
o Save
context
(registers)
o Process
interrupt
 Fetch
data
BITS Pilani, Deemed to be University under Section &
3 of UGC
Changes in Memory and
Registers for an Interrupt

BITS Pilani, Deemed to be University under Section 3 of UGC

Design Issues
• How do you identify the module issuing the
interrupt?
• How do you deal with multiple interrupts?
o i.e. an interrupt handler being interrupted
 Identifying Interrupting Module (1)
• Different line for each module
o PC
o Limits number of devices
•Software poll
o CPU asks each module in turn eg. TESTI/O
o or reads the status register of each I/O
module
o Slow & time consuming

BITS Pilani, Deemed to be University under Section 3 of UGC

Identifying Interrupting Module
(2)
• Daisy Chain or Hardware poll
o Interrupt Acknowledge sent down a chain
o Module responsible places vector on bus
o CPU uses vector to identify handler routine
• Bus Master
o Module must claim the bus before it can raise
interrupt
o e.g. PCI & SCSI

Daisy-chaining interrupts at
the same level of priority

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiple
Interrupts
• Each interrupt line has a priority
• Higher priority lines can interrupt lower priority
lines

• If bus mastering only current master can

interrupt

• Example - PC Bus
• 80x86 has one interrupt line
• 8086 based systems use one 8259A interrupt
controller
• 8259A has 8 interrupt lines

BITS Pilani, Deemed to be University under Section 3 of UGC

Interrupts (contd..)
Consider a system employing interrupt-driven I/O for a particular
device that transfers data at an average of 8 KB/s on a continuous
basis. a. Assume that interrupt processing takes about 100 µs (i.e.,
the time to jump to the interrupt service routine (ISR), execute it,
and return to the main program).
Determine what fraction of processor time is consumed by this I/O
device if it interrupts for every byte.

BITS Pilani, Deemed to be University under Section 3 of UGC

Interrupts (contd..)
Now assume that the device has two 16-byte buffers and interrupts
the processor when one of the buffers is full. Naturally, interrupt
processing takes longer, because the ISR must transfer 16 bytes.
While executing the ISR, the processor takes about 8µ s for the
transfer of each byte. Determine what fraction of processor time is
consumed by this I/O device in this case

BITS Pilani, Deemed to be University under Section 3 of UGC

Interrupts (contd..)
Now assume that the processor is equipped with a block transfer I/O
instruction such as that found on the Z8000. This permits the associated
ISR to transfer each byte of a block in only 2 µs. Determine what fraction
of processor time is consumed by this I/O device in this

BITS Pilani, Deemed to be University under Section 3 of UGC

Interrupts
(contd..)
A particular system is controlled by an operator through commands
entered from a keyboard. The average number of commands entered
in an 8-hour interval is 60.
a. Suppose the processor scans the keyboard every 100 ms. How many
times will
the keyboard be checked in an 8-hour period?
b.By what fraction would the number of processor visits to the keyboard
be reduced if interrupt-driven I/O were used?

BITS Pilani, Deemed to be University under Section 3 of UGC

Sequence of Events
• 82C59A accepts interrupts
• 82C59A determines priority
• 82C59A signals 8086 (raises INTR line)
• CPU Acknowledges
• 82C59A puts correct vector on data bus
• CPU processes interrupt
 ISA Bus Interrupt System
• ISA bus chains two 82C59As together
• Link is via interrupt 2
• Gives 15 lines
o 16 lines less one for link
• IRQ 9 is used to re-route anything trying to use
IRQ 2
• Incorporated
o Backwardsincompatibility
chip BITS Pilani, Deemed to be University under Section 3 of UGC
82C59A Interrupt
Controller

BITS Pilani, Deemed to be University under Section 3 of UGC

Intel 82C55A
Programmable Peripheral Interface

BITS Pilani, Deemed to be University under Section 3 of UGC

The Intel 8255A
Control Word

(b) Bit definitions of the 8255 control

(a) Mode denition of the 8255 control register to

configure the 8255
BITS Pilani, Deemed to be University under Section 3 of UGC
Keyboard/Display Interface to
8255A

BITS Pilani, Deemed to be University under Section 3 of UGC

Direct Memory
Access
• Interrupt driven and programmed I/O require
active CPU intervention
o Transfer rate is limited

o CPU is tied up

• DMA is the answer

 DMA Function
• Additional Module (hardware) on bus
• DMA controller takes over from CPU for I/O

BITS Pilani, Deemed to be University under Section 3 of UGC

DM
A

BITS Pilani, Deemed to be University under Section 3 of UGC

Typical DMA Module
Diagram DMA
•
Operation
CPU tells DMA controller:-
o Read/Write
o Device address
o Starting address of
memory block for data
o Amount of data to
be transferred
• CPU carries on with
other work
• DMA controller deals
with transfer
• DMA controller sends
interrupt when finished

BITS Pilani, Deemed to be University under Section 3 of UGC

DMA Transfer Cycle
Stealing
• DMA controller takes over bus for a cycle
• Transfer of one word of data
• Not an interrupt
o CPU does not switch context

• CPU suspended just before it accesses bus

o i.e. before an operand or data fetch or a data
write

• Slows down CPU but not as much as CPU doing

transfer

BITS Pilani, Deemed to be University under Section 3 of UGC

DMA and Interrupt Breakpoints
During an Instruction Cycle

BITS Pilani, Deemed to be University under Section 3 of UGC

DMA
(contd..)
A DMA module is transferring characters to memory using cycle stealing,
from a device transmitting at 9600 bps. The processor is fetching
instructions at the rate of 1 million instructions per second (1 MIPS). By
how much will the processor be slowed down due to the DMA activity?

BITS Pilani, Deemed to be University under Section 3 of UGC

DMA Configurations
(1)

•Single Bus, Detached DMA

controller
•Each transfer uses bus twice
oI/O to DMA then DMA to memory

•CPU is suspended twice

BITS Pilani, Deemed to be University under Section 3 of UGC
DMA Configurations
(2)

• Single Bus, Integrated DMA

controller
• Controller may support >1
device
• Each transfer uses bus once
o DMA to memory
• CPU is suspended once
BITS Pilani, Deemed to be University under Section 3 of UGC
DMA Configurations
(3)

• Separate I/O Bus

• Bus supports all DMA enabled
devices
• Each transfer uses bus once
o DMA to memory
• CPU is suspended once
BITS Pilani, Deemed to be University under Section 3 of UGC
Intel 8237A DMA
Controller
• Interfaces to 80x86 family and DRAM
• When DMA module needs buses it sends HOLDRequest (HRQ) signal to
processor

• CPU responds HLDA (hold acknowledge)

o DMA module can use buses

• E.g. transfer data from memory to disk

1. Device requests service of DMA by pulling DREQ (DMA

request) high 2.DMA puts high on HRQ (hold request),

3.CPU finishes present bus cycle (not necessarily present
instruction) and puts
high on HDLA (hold acknowledge). HOLD remains active for
duration of DMA

4.DMA activates DACK (DMA acknowledge), telling device to start

transfer 5.DMA starts transfer by putting address of first byte on

address bus and

8237 DMA
Usage of
Systems Bus

BITS Pilani, Deemed to be University under Section 3 of UGC

Fly-
By(8237)
• While DMA using buses processor idle
• Processor using bus, DMA idle
o Known as fly-by DMA controller

• Data does not pass through and is not stored in

DMA chip
o DMA only between I/O port and memory

o Not between two I/O ports or two memory

locations

• Can do memory to memory via register

• 8237 contains four DMA channels
o Programmed independently

o Any one active

BITS Pilani, Deemed to be University under Section 3 of UGC
I/O Channels and its evolution
• I/O devices getting more sophisticated
• e.g. 3D graphics cards
• CPU instructs I/O controller to do transfer
• I/O controller does entire transfer
• I/O module is enhanced to become a processor in its
own right

• I/O module has memory of its own

• Improves speed
o Takes load off CPU

o Dedicated processor is faster

BITS Pilani, Deemed to be University under Section 3 of UGC

I/O Channel
e
Architectur
 A selector channel controls multiple high-
speed devices and, at any one time, is
dedicated to the transfer of data with one of
those devices. Thus, the I/O channel selects
one device and effects the data transfer

 A multiplexor channel can handle I/O

with multiple devices at the same time
 Thus, the I/O channel serves in place of the
CPU in
controlling these I/O controllers

BITS Pilani, Deemed to be University under Section 3 of UGC

Example
A given processor requires 1000 cycles to perform a context switch and start an
interrupt handler (and the same number of cycles to switch back to the program
that was running when the interrupt occurred), or 500 cycles to poll an I/O
device.An I/O device attached to that processor makes 150 requests per second,
each of which takes 10,000 cycles to resolve once the handler has been started.
By default , the processor polls every 0.5 ms if it is not using interrupts

How many cycles per second does the processor spend handling I/O from

the device if interrupts are used?

Sol: The device makes 150 requests, each of which require one interrupt.
Each interrupt

takes 12,000 cycles (1000 to start the handler, 10,000 for the handler,1000
to switch

back to the original program), for a total of 1,800,000 cycles spent handling
BITS Pilani, Deemed to be University under Section 3 of UGC
Example contnd.
b. How many cycles per second are spent on I/O if polling is used (include all polling
attempts)? Assume the processor only polls during time slices when user programs
are not running, so do not include any context-switch time in your calculation

Sol: The processor polls every 0.5 ms, or 2000 times/s. Each polling attempt takes
500 cycles, so it spends 1,000,000 cycles/s polling. In 150 of the polling attempts,
a request is waiting from the I/O device, each of which takes 10,000 cycles to
complete for another 1,500,000 cycles.
Therefore, the total time spent on I/O each second is 2,500,000 cycles with polling.

C. How often would the processor have to poll for polling to take as many cycles
per second as

interrupts

Sol: In the polling case, the 150 polling attempts that find a request waiting
consume 1,500,000

cycles. Therefore, for polling to match the interrupt case, an additional 300,000
BITS Pilani, Deemed to be University under Section 3 of UGC
Example 2.
An I/O device transfers 10 MB/s of data into the memory of a processor over the I/O
bus, which has

a total data transfer capacity of 100 MB/s. The 10 MB/s of data is transferred as 2500
independent

pages of 4 KB each. If the processor operates at 200 MHz, it takes 1000 cycles to

initiate a DMA transaction, and 1500 cycles to respond to the device's interrupt when

the DMA transfer completes, what fraction of the processor's time is spent handling

the data transfer with and without DMA?

Sol:Without DMA, the processor must copy the data into the memory as the I/O device
sends it over the bus. Because the device sends 10 MB/s over the I/O bus, which has a
capacity of 100 MB/s, 10% of each second is spent transferring data
over the bus.
Assuming the processor is busy handling data during the time that each page is being
transferred over the bus (which is a reasonable assumption because the time
between transfers is too short to be worth doing a context
BITS Pilani, Deemed toswitch), then
be University 10%
under of3the
Section of UGC
Example 2. Sol
With DMA, the processor is free to work on other tasks, except when
initiating each DMA and responding to the DMA interrupt at the end of
each transfer. This takes 2500 cycles/transfer, or a total of 6,250,000
cycles spent handling DMA each second.
Because the processor operates at 200 MHz, this means that 3.125% of
each second, or 3.125% of the processor's time, is spent handling DMA,
less than 1/3 of
the overhead without DMA.

BITS Pilani, Deemed to be University under Section 3 of UGC

Example
A microprocessor provides an instruction capable of moving a string of
bytes from one area of memory to another. The fetching and initial
decoding of the instruction takes 10 clock cycles. Thereafter, it takes 15
clock cycles to transfer each byte. The microprocessor is clocked at a
rate of 10 GHz.
a. Determine the length of the instruction cycle for the case of a string
of 64 bytes.
b.What is the worst-case delay for acknowledging an interrupt if the
instruction is non-interruptible?
c.Repeat part (b) assuming the instruction can be interrupted at the
beginning of each byte transfer.

BITS Pilani, Deemed to be University under Section 3 of UGC

 Interesting information
 the 1st drive with a capacity 1 GByte ,
produced
 by IBM in 1980
 weighed 250 kg
 cost $40,000
 size of a kitchen appliance

2 BITS Pilani, Deemed to be University under Section 3,

Magnetic
disk
 A disk is a circular platter constructed of
nonmagnetic material, called the substrate,
coated with a magnetizable material.
 Traditionally, the substrate has been an
aluminum or aluminum alloy material. More
recently, glass substrates have been introduced.
 The glass substrate has a number of benefits
 Improvement in the uniformity of the magnetic film surface
to increase
disk reliability
 A significant reduction in overall surface defects to help
reduce read-write errors
 Better stiffness to reduce disk dynamics
 Greater ability to withstand shock and damage

BITS Pilani, Deemed to be University under Section 3 of UGC

Disk Data Layout

BITS Pilani, Deemed to be University under Section 3 of UGC

Organization of Data on a
Disk

Sector 0,
Sector 2,
track 1
trackn Sector 0,
track 0

The set of corresponding tracks on all surfaces

of a stack of disk form a cylinder
Organization of one surface of a The data are accessed by specifying the
disk. surface number, the track and the sector
number

BITS Pilani, Deemed to be University under Section 3 of UGC

Disk Performance Parameter

 Seek time: is the time required to move the disk arm to the required track.
 Rotational delay: Disks, other than floppy disks, rotate at speeds ranging from
3600 rpm (for handheld devices such as digital cameras) up to, as of this writing,
20,000 rpm;
 Transfer time : The transfer time to or from the disk depends on the rotation
speed of the disk T = b /rN
T = transfer time ;b = number of bytes to be transferred; N = number of bytes
on a track ; r = rotation speed, in revolutions per second
Ttotal = Ts + 1/2r + b/rN

BITS Pilani, Deemed to be University under Section 3 of UGC

A timing
comparison
Consider a disk with an advertised average seek time of 4 ms, rotation speed of 15,000
rpm, and 512-byte sectors with 500 sectors per track. Suppose that we wish to read a file
consisting of 2500 sectors for a total of 1.28 Mbytes

First, let us assume that the file is stored as compactly as possible on the disk. That is, the
file occupies all of the sectors on 5 adjacent tracks (5 tracks * 500 sectors/track = 2500
sectors). This is known as sequential organization
Average seek 4 ms; Average rotational delay 2 ms
Read 500 sectors= b/rN= (512X500 X 60)/(15000 X500X512)=4 ms ; Thus Total=10 ms
Suppose that the remaining tracks can now be read with essentially no seek time.
That is, the I/O operation can keep up with the flow from the disk. Then, at most, we
need to deal with rotational delay for the four remaining tracks. Thus each successive
track is read in 2 + 4 = 6 ms.
To read the entire file, Total time = 10 + (4 * 6) = 34 ms = 0.034 seconds

BITS Pilani, Deemed to be University under Section 3 of UGC

Now let us calculate the time required to read the same data using random access rather
than sequential access;
that is, accesses to the sectors are distributed randomly over the disk.
For each sector, we have
Average seek 4 ms
Rotational delay 2 ms
Read 1 sector =
b/rN= (512X
60)/(15000
X500X512)= 0.008 ms
Which is
6.008 ms
Total time = 2500 * BITS Pilani, Deemed to be University under Section 3 of UGC
Access Time
 Seek Time: 5 to 8 ms( Time required to move the
read/write head to proper track
 Rotational delay or latency time(time to reach the
correct sector over the track:
 On average This is the time for half a rotation of disk
 Access Time is the sum of these delays
 Disk capacity( of a 3.5 inch diameter )
 20 Data recording surface
 15000 tracks / surface
 400 sectors per track

 20 X 15000 X 400 X 512 ≅ 60 × 109= 60

 each sector contains 512 bytes of data

GBytes
 Access time
=
6
0

10000×
 Typical Average seek time 6 ms
= 3𝑚𝑠
rotate at 10,000 rpm so latency
2
 BITS Pilani, Deemed to be University under Section 3 of UGC
Some basic
concepts
• Maximum size of the Main Memory is determined
by the addressing scheme
• byte-addressable(big endian or…
Little endian)
• CPU-Main Memory Connection

Processor Memory
k-
address
bit
MAR bus
n-bit
data
bus Up to 2k
MDR addressable
locations

Word length = n
Control lines bits
( R / W , MFC,
etc.)
BITS Pilani, Deemed to be University under Section 3 of UGC
Big Endian or Little
Endian

16 bit address capable

of 2^16= 64k Mempry
locations

What abt with 32 bit

instructions?

2^32= 4 G memory
locations

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory timing definitions
 Memory Access time: The time between
initiation of a task and its completion for example
time elapsed between Read and MFC (memory
function complete)
 Memory Cycle time: the minimum time delay
between two successive memory operations , eg:
the time between two successive write operations
 Memory cycle time is usually larger than
the Memory access time

BITS Pilani, Deemed to be University under Section 3 of UGC

Internal organization of memory
chips 16 X 8 orgaization

BITS Pilani, Deemed to be University under Section 3 of UGC

Internal organization of memory
chips

• Each memory cell can hold one bit of information.

• Memory cells are organized in the form of an array.
• One row is one memory word.
• All cells of a row are connected to a common line,
known as the “word line”.
• Word line is connected to the address decoder.
• Sense/write circuits are connected to the data
input/output
lines of the memory chip.
• The memory circuit stores 128 bits
• Requires 14 external connections for address data
and
control lines
• & another two for Vdd and Grnd
BITS Pilani, Deemed to be University under Section 3 of UGC
Address is 404 for a 1K‐word
memory.
Comparison wrt to
Decoder requirement

How many AND(NAND)

gates required for 10 to 1k
decoder?

How many AND(NAND)

gates
required for 5 to 32k
decoder?

BITS Pilani, Deemed to be University under Section 3 of UGC

Address multiplexing for a 64K
DRAM
 After a time
equivalent to the
settling time of the
row selection, RAS
goes back to the 1
level.
 Registers are
used to store the
addresses of the
row and column.
 CAS must go
back to the 1 level
before initialing
another memory
operation.
BITS Pilani, Deemed to be University under Section 3 of UGC
Memory
organisation
Figure indicates how to construct a module of chips that can store 1 MB
based on a group of four 256-Kbyte chips. Let’s say this module of chips is
packaged as a single 1-MB chip, where the word size is 1 byte. Give a
high-level chip diagram of how to construct an 8-MB computer memory
using eight 1-MB chips. Be sure to show the address lines in your diagram
and what the address lines are used for.

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory
organisation

BITS Pilani, Deemed to be University under Section 3 of UGC

SRAM Ce
l
• Two transistor inverters are cross connected to implement a
basic flip-flop.
• The cell is connected to one word line and two bits lines by
transistors T1 and T2
• When word line is at ground level, the transistors are turned off
and the
latch retains its state
• Read operation: In order to read state of SRAM cell, the word
line is activated to close switches T1 and T2. Sense/Write
b
circuits at the bottom monitor the state ofbb and b’

T2
T X Y
1

Word line
Bit lines

BITS Pilani, Deemed to be University under Section 3 of UGC

DRAM
S

 The address line is activated when

the bit value from this cell is to
be read or written
 The transistor acts as a switch
 In a read operation the sense
amplifier connected to bit line
decides whether the voltage is above
some threshold
 Then it drives the bit line to the
full voltage representing 1
 for zero the bit line is grounded
thus discharging the capacitor
completely
BITS Pilani, Deemed to be University under Section 3 of UGC
Asynchronous DRAMs

• Static RAMs (SRAMs):

– Consist of circuits that are capable of retaining their
state as long as the power is applied.
– Volatile memories, because their contents are lost when
power is interrupted.
– Access time of static RAMs are in the range of few
nanoseconds.
– However, the cost is usually high.
• Dynamic RAMs (DRAMs):
– Do not retain their state indefinitely. So called dynamic
– Contents must be periodically refreshed.
– Contents may be refreshed while accessing them for
reading.

BITS Pilani, Deemed to be University under Section 3 of UGC

DRAM
operation
Consider a dynamic RAM that must be given a refresh cycle 64
times per ms. Each refresh operation requires 150 ns; a memory
cycle requires 250 ns.
What percentage of the memory’s total operating time must be
given to refreshes?

BITS Pilani, Deemed to be University under Section 3 of UGC

DRAM
timing

Figure shows a simplified timing diagram for a DRAM read operation over a
bus. The access time is considered to last from t1 to t2. Then there is a
recharge time, lasting from t2 to t3, during which the DRAM chips will have
to recharge before the processor can access them again.
a. Assume that the access time is 60 ns and the recharge time is 40 ns.
What is the memory cycle time? What is the maximum data rate this
DRAM can sustain, assuming a 1-bit output?
b. b. Constructing a 32-bit wide memory system using these chips
yields what data transfer rate?

BITS Pilani, Deemed to be University under Section 3 of UGC

DRAM
timing

BITS Pilani, Deemed to be University under Section 3 of UGC

Internal organization of 2M X 8
dynamic memory chip
Arranged in 4k X 4k
• Each row can store 512
RAS
bytes. 12 bits to select a
row, and 9 bits to select a
group in a row. Total of 21
Row Row 4096 bits.
addres
decode 512 8
s latch
r cell array • First apply the row
address, RAS signal
latches the row address.
Then apply the column
Sense / CS address, CAS signal
A20- 9  A8 -
Write
0 circuits R/ W latches the address.
• Timing of the memory unit
is controlled by a
Colum
n Colum specialized unit which
n
addres decode generates RAS and CAS.
s latch r
• This is asynchronous
CAS D7 D0 DRAM

BITS Pilani, Deemed to be University under Section 3 of UGC

Synchronous
DRAMS
• Operation is
directly synchronized with
processor clock signal.
• The outputs of the
sense circuits are connected
to a latch.
• During a Read
operation, the contents
of the cells in a row are
loaded onto the latches.
• During a refresh
operation, the contents of
the cells are refreshed
without changing the
contents of the latches.
• Data held in the
latches correspond to the
selected columns are
transferred to the output.
• For a burst mode of
operation, successive columns
are under
BITS Pilani, Deemed to be University selected
Section 3using
of UGC column
Static memories implementation : example

Start here 12/04/2024

 Implement a memory unit of 2M words of 32
bits each.
 Use 512kx8 static memory chips.
 Each column consists of 4 chips.
 Each chip implements one byte data.
 A chip is selected by setting its chip select
control line to 1.
 Selected chip places its data on
the data output line, outputs of other
chips are in high impedance state.
 21 bits to address a 32-bit word.
 High order 2 bits are needed to select
the row, by activating the four Chip Select
signals.
 19 bits are used to access specific byte
BITS Pilani, Deemed to be University under Section 3 of UGC
Static memories implementation of 2M X 32 bits Implemented as
512k X 8 X 4

BITS Pilani, Deemed to be University under Section 3 of UGC

Latency, Bandwidth, and DDRSDRAMs
 Memory latency is the time it takes to transfer a word of data to or
from memory
 Memory bandwidth is the number of bits or bytes that can
be transferred in one second.
 DDRSDRAMs: on both edges
Cell array is organized in two banks

BITS Pilani, Deemed to be University under Section 3 of UGC

Memory
controller

Row/
Address address
Column
RAS
R/ W
Memor CAS
Request y R/ W
Processor controll
CS Memory
er
Clock
Clock

Dat
a

BITS Pilani, Deemed to be University under Section 3 of UGC

Refreshing overhead

 When used with DRAM chips which do not have

a self refreshing capability Memory controller has
to provide all the information needed for
refershing
 Consider a memory of 8k= 8192 Rows
 4 cycles to refresh each row then 8192 X 4
= 32768 cycles to refresh all rows
 In typical SDRAMs a typical refresh period is
64 ms
 at a clock rate of 133Mhz
 time needed to refresh all rows =
32768/133MHz = 0.246ms
 overhead= 0.246/64= 0.0038 = 0.4% total
time available for accessing memory
BITS Pilani, Deemed to be University under Section 3 of UGC
Interleaving

◾ Divides the memory system into a

number of memory modules. Each module has
its own address buffer register (ABR) and data buffer register (DBR).
◾ Arranges addressing so that successive
words in the address space are
placed in different modules.
◾ When requests for memory access
involve consecutive addresses, the
access will be to different modules.
◾ Since parallel access to these
modules is possible, the average rate
of fetching words from the Main
Memory can be increased.
BITS Pilani, Deemed to be University under Section 3 of UGC
Methods of address
layouts
k m
bits bits
Module Address in MM
module address

ABR DBR ABR DBR ABR DBR

Modul Module Module

e0 i n- 1

◾ Consecutive words are placed in a module.

◾ High-order k bits of a memory address determine
the module.
◾ Low-order m bits of a memory address determine
the
word within a module.
◾ When a block of words is transferred from main memory
to cache, only one module is busy at a time.
BITS Pilani, Deemed to be University under Section 3 of UGC
Methods of address layouts(Interleaving)
m bits k
bits
Address in Module MM
module address

•Consecutive words
located in are
modules. consecutive
ABR DBR ABR DBR ABR DBR
•Consecutive addresses
Modul
e0
Module
i
Modul
e 2k -
be
canlocated in consecutive
1 modules.
• While transferring a
block of data, several
memory modules can be
kept busy at the same time.

BITS Pilani, Deemed to be University under Section 3 of UGC

Exampl
e
9 0000
8 0001
00 01 10
7 0010
00 9 00 8 00 7
6 0011
01 5 01 4 01 3
5 0100
10 1 10 15 10 14
4 0101
11 12 11 11 11 10
3 0110
2 0111
1 1000
15 1001 11
14 1010 00 6
13 1011 01 2
12 1100 10 13
11 1101 11 0
10 1110
0 1111
BITS Pilani, Deemed to be University under Section 3 of UGC
Example: (Definition of
cylinders)

A disk has 24 recording surface has 14000 cylinders. There is an

average of 400 sectors per track. Each sector contains 512 Bytes
of data
What is the maximum number of bytes that can be stored?

The maximum number of bytes that can be stored on this disk is

24 X14000 X400X 512 = 68.8 X 109 bytes

What is the data transfer rate in bytes per second at a rotational

speed of 7200rpm?

The data transfer rate is (400 X 512 X 7200)/60 = 24.58 X 106

bytes/s.

BITS Pilani, Deemed to be University under Section 3 of UGC

The number of cylinders of a disk drive exactly equals the number
of tracks on a single surface in the drive. It comprises the same track
number on each platter, spanning all such tracks across each platter
surface that is able to store data

Example: Contnd
A disk has 24 recording surface has 14000
tracks. There is an
average of 400 sectors per track. Each sector
contains 512 Bytes
of data

Using a 32 bit word , suggest a suitable addressing scheme for

the disk, assuming that there are 512 bytes / Sector

Need 9 bits to identify a sector, 14 bits for a track, and 5 bits for a
surface.
Thus, a possible scheme is to use address bits A8-0 for sector,
A22-9 for track,
BITS Pilani, Deemed to be University under Section 3 of UGC
Example

What is the average time to read or write a 512-byte

sector for a typical disk rotating at 15,000 RPM? The
advertised average seek time is 4 ms, the transfer
rate is 100 MB/sec, and the controller overhead is 0.2
ms. Assume that the disk is idle so that there is no
waiting time.
Sol:
Average disk access time = average seek time + average rotational
Delay + transfer time + controller overhead
rotational latency Also
seek The process of called rotational delay.
positioning a The time required for the
read/write head over desired sector of a disk to
the proper track on a rotate under the read/write
disk & seek time is the head;
time consumed in it usually assumed to be half
the rotation time.
BITS Pilani, Deemed to be University under Section 3 of UGC
What is the average time to read or write a 512-byte
sector for a typical disk rotating at 15,000 RPM? The
advertised average seek time is 4 ms, the transfer
rate is 100 MB/sec, and the controller overhead is 0.2
ms. Assume that the disk is idle so that there is no
waiting time.
Sol:
Average disk access time = average seek time + average rotational
Delay + transfer time + controller overhead

=4.0 ms + 0.5 rotation /15,000 RPM + 0.5 KB/100 MB/sec + 0.2 ms

= 4.0 + 2.0 + 0.005 + 0.2 = 6.2 ms

If the measured average seek time is 25% of the advertised average

time, then it is
1.0 ms + 2.0 ms + 0.005 ms + 0.2 ms = 3.2 ms
BITS Pilani, Deemed to be University under Section 3 of UGC
Exampl
e:

Design a 16-bit memory of total capacity 8192 bits using SRAM chips of size 64 X 1 bit. Give the array
configuration of the chips on the memory board showing all required input and output signals for
assigning this memory to the lowest address space. The design should allow for both byte and 16-bit
word accesses.

Sol:
Number of total addressable locations of 16 bits= 8192/16= 512
no of 64 X 1 chips required 8192/64= 128 chips
There will be 128/16=8 row of 16 bits
512 can be rranged into 8 rows X 64 columns
For 8 rows no of decoder bits 3
For 64 columns no of decoder bits 6

BITS Pilani, Deemed to be University under Section 3 of UGC

Solution

BITS Pilani, Deemed to be University under Section 3 of UGC

Example

•Consider a dynamic RAM that must be given a refresh

cycle 64 times per ms. Each refresh operation requires
150 ns; a memory cycle requires 250 ns.What percenta
of the memory’s total operating time must be given to
refreshes?

BITS Pilani, Deemed to be University under Section 3 of UGC

Example

•Consider a dynamic RAM that must be given a refresh

cycle 64 times per ms. Each refresh operation requires
150 ns; a memory cycle requires 250 ns.What percenta
of the memory’s total operating time must be given to
refreshes?
•Sol:

9600ns/1ms X 100= 0.96 % ≅ 1%

• Total Time consumed in 1 ms = 64 X 150 ns= 9600 ns

%=
out of 1ms
150𝑛𝑠 ×100
refresh operation consumes250
60% 𝑛𝑠
•Out of total time of memory cycle of 250 ns
BITS Pilani, Deemed to be University under Section 3 of UGC
Figure shows a simplified timing diagram for a DRAM read
operation over a bus.

BITS Pilani, Deemed to be University under Section 3 of UGC

What is the memory cycle time?
What is the maximum data rate this DRAM can
sustain, assuming a 1-bit output?

BITS Pilani, Deemed to be University under Section 3 of UGC

What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?

Sol: memory cycle time = (60+ 40)=100 ns

BITS Pilani, Deemed to be University under Section 3 of UGC

What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?

Sol: memory cycle time = (60+ 40)=100 ns

1 bit in 100ns so 1/100ns per/ sec= 10Mbps

BITS Pilani, Deemed to be University under Section 3 of UGC

What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?

Sol: memory cycle time = (60+ 40)=100 ns

1 bit in 100ns so 1/100ns per/ sec= 10Mbps

Constructing a 32-bit wide memory system using these chips yields

what data transfer rate?

BITS Pilani, Deemed to be University under Section 3 of UGC

What is the memory cycle time?
What is the maximum data rate this DRAM can sustain, assuming a
1-bit output?

Sol: memory cycle time = (60+ 40)=100 ns

1 bit in 100ns so 1/100ns per/ sec= 10Mbps

Constructing a 32-bit wide memory system using these chips yields

what data transfer rate?
Ans 320Mbps= 320Mbps/8 = 40MB/s

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiple level
cache

Average Access Time

Given a 100 MHz machine with a with a miss penalty of 20 cycles, a

hit time of 2 cycles, and a miss rate of 5%, calculate the average
memory access time (AMAT).

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiple level cache

Given a 100 MHz machine with a with a miss penalty of 20 cycles, a

hit time of 2 cycles, and a miss rate of 5%, calculate the average
memory access time (AMAT).
AMAT = hit_time + miss_rate x miss_penalty

Since the clock rate is 100 MHz, the cycle time is:
1/(100 MHz) = 10 ns
which gives
AMAT = 10 ns x (2 + 20 x 0.05) = 30 ns

Note: Here we needed to multiply by the cycle time

because the hit_time and miss_penalty were given in
cycles. BITS Pilani, Deemed to be University under Section 3 of UGC
Multiple level
cache

Suppose doubling the size of the cache decrease the miss rate
to 3%, but causes the hit time to increases to 3 cycles and the
miss penalty to increase to 21 cycles. What is the AMAT of the
new machine?

BITS Pilani, Deemed to be University under Section 3 of UGC

Multiple level
cache
Suppose doubling the size of the cache decrease
the miss rate to 3%, but causes the hit time to
increases to 3 cycles and the miss penalty to
increase to 21 cycles. What is the AMAT of the
new machine?
For the new machine:
AMAT = 10 ns x (3 + 21 x 0.03) = 36.3 ns

BITS Pilani, Deemed to be University under Section 3 of UGC

Mass
Storage
• Many systems today need to store many
terabytes of data
• Don’t want to use single, large disk
– too expensive
– failures could be dangerous
• Would prefer to use many smaller disks

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID
• Redundant Array of Independent Disks
• Basic idea is to connect multiple disks
together to provide
– large storage capacity
– faster access to reading data
– redundant data
• Many different levels of RAID systems
– differing levels of redundancy, error checking,
capacity, and cost

BITS Pilani, Deemed to be University under Section 3 of UGC

Striping
• Take file data and map it to different disks
• Allows for reading data in parallel

file block 0 block 1 block 2 block 3

data

Disk Disk Disk Disk

0 1 2 3
BITS Pilani, Deemed to be University under Section 3 of UGC
MIRRORING
file block 0 block 1 block 2 block 3 block 4
data

1 block 1 block
0 0
sector 2 block 2 block
sector
s 1 1
s
3 block 3 block
2 2
4 block 4 block
3 3
5 block 5 block
Disk Disk
4 41
0
5 5
BITS Pilani, Deemed to be University under Section 3 of UGC
Mirrorin
g
• Keep two copies of data on two separate disks
• Gives good error recovery
– if some data is lost, get it from the other source
• Expensive
– requires twice as many disks
• Write performance can be slow
– have to write data to two different spots
• Read performance is enhanced
– can read data from file in parallel

BITS Pilani, Deemed to be University under Section 3 of UGC

Parit
y
• Way to do error checking and correction
• Add up all the bits that are 1
– if even number, set parity bit to 0
– if odd number, set parity bit to 1
• To actually implement this, do an exclusive OR of
all the bits being considered
• Consider the following 2 bytes
byte parity
10110011 1
01101010 0
• If a single bit is bad, it is possible to correct it
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID
Level-0
• Often called striping
• Break a file into blocks of data
• Stripe the blocks across disks in the system
• Simple to implement
– disk = file block % number of disks
– sector = file block / number of disks
• provides no redundancy or error detection

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level-0

file block 0 block 1 block 2 block 3 block 4

data

1 block 1 block
0 1
sector 2 block 2 block
sector
s 2 3
s
3 block 2
4 3
3 4
4 5
5
Disk Disk
0 1
BITS Pilani, Deemed to be University under Section 3 of UGC
Data Mapping For RAID 0

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level-1
• A complete file is stored on a single disk
• A second disk contains an exact copy of the file
• Provides complete redundancy of data
• Read performance can be improved
– file data can be read in parallel
• Write performance suffers
– must write the data out twice
• Most expensive RAID implementation
– requires twice as much storage space

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level-1

file block 0 block 1 block 2 block 3 block 4

data

1 block 1 block
0 0
sector 2 block 2 block
sector
s 1 1
s
3 block 3 block
2 2
4 block 4 block
3 3
5 block 5 block
Disk Disk
4 41
0
5 5
BITS Pilani, Deemed to be University under Section 3 of UGC
RAID Level 2
 Uses Bit-level striping with Hamming codes for ECC
 Number of disks required depends on exact implementation
 Only fair fault tolerance
 Advantages
 Random Read performance=fair
 Sequential Read Performance=very good
 Sequential Write performance=good
 Disadvantages
 Random Write Performance=poor
 Requires a complex controller
 High overhead for check disks
 Not used in modern systems

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level 2
RAID Level 3
 Uses byte-level striping with dedicated parity
 Requires minimum three drives to implement
 Has good fault-tolerance
 Advantages
 Random Read Performance=good
 Sequential Read performance=very good
 Sequential Write performance=fair to good
 Lowest overhead for check disks
 Disadvantages
 Random Write performance=poor
 Complex controller design

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level 3
RAID Level 4

 Uses Block-level striping with dedicated parity

 Requires minimum of 3 drives to implement
 Has good fault-tolerance
 Advantages
 Random Read Performance=very good
 Sequential Read and Write performance=good
 Lowest overhead of check disks
 Disadvantages
 Quite complex controller design
 Random write performance=poor
 Not commonly used

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level 4
RAID Level 5
 Uses Block-level striping with distributed parity
 Requires a minimum of 3 drives to implement
 Advantages
 Random Read performance=very good
 Random Write performance=fair
 Sequential Read and Write performance=good
 Lowest overhead of check disks
 Disadvantages
 Most complex controller design
 Difficult to rebuild in the event of a disk failure

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level 5
RAID 6
• Two parity calculations
• Stored in separate blocks on different disks
• User requirement of N disks needs N+2
• High data availability
– Three disks need to fail for data loss
– Significant write penalty

BITS Pilani, Deemed to be University under Section 3 of UGC

RAID Level 6

Raid Level 6 uses Block-level striping with dual

distributed parity
RAID Level 1+0

Uses Mirroring and striping without parity

RAID Level 0+1

Uses Mirroring and striping without Parity

CONTROL UNIT OPERATION
 The execution of an instruction involves the execution of a
sequence of substeps, generally called cycles
 The Fetch Cycle
 The Indirect Cycle
 The Interrupt Cycle
 The Execute Cycle
 The Instruction Cycle
 control unit of a processor performs two tasks:
 step through a series of micro-operations in the proper
sequence
 generates the control signals

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
CONTROL UNIT OPERATION
 The control signals generated by the control unit cause the
opening and closing of logic gates, resulting in the transfer of
data to and from registers and the operation of the ALU
 technique for implementing a control unit
 hardwired implementation
 Microprogrammed control
 Micro-operations are a sequence of more
fundamental operations which constitute the operation
cycle.
 A single micro-operation generally involves a transfer
between registers,
a transfer between a register and an external bus, or a
simple ALU operation
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-Operations

A computer executes a
program Fetch/execute
cycle
Each cycle has a number of steps : Called
micro- operations
Each step does very little: Atomic operation of
CPU micro-operations needed to perform the
subcycles of the instruction cycle.

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Constituent
Elements of Program
Execution

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
The Fetch
Cycle
 Memory Address Register (MAR)
▪Connected to address bus
▪Specifies address for read or write op
 Memory Buffer Register (MBR)
▪Connected to data bus
▪Holds data to write or last data read
Program Counter (PC)
▪Holds address of next instruction to be
fetched
 Instruction Register (IR)
▪Holds last instruction fetched
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Executing an Instruction

 Fetch the contents of the memory location

pointed to by the PC. The contents of this
location are loaded into the IR (fetch
phase).

IR ← [[PC]]

 Assuming that the memory is byte

addressable, increment the contents of
the PC by 4 (Assume 4 Byte instruction).
PC ← [PC] + 4

 Carry out the actions specified by the

instruction in BITS
Act, 1956 IR (execution phase).
thePilani, Deemed to be University under Section 3 of UGC
Processor
Organizati
on

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Internal
organization of
the processor
o ALU
o Registers for temporary storage
o Various digital circuits for executing
different micro operations.(gates,
MUX,decoders,counters).
o Internal path for movement of data
between ALU and registers.
o Driver circuits for transmitting signals to
external units.
o Receiver BITS
circuits for
Pilani, Deemed
Act, 1956
incoming
to be University signals
under Section 3 of UGC
PC:
 Keeps track of execution of a program
Contains the memory address of the next instruction to be
fetched and executed.
MAR:
Holds the address of the location to be accessed.
I/P of MAR is connected to Internal bus and an O/p to
external bus. MDR:
Contains data to be written into or read out of the
addressed location.
IT has 2 inputs and 2 Outputs.
Data can be loaded into MDR either from memory bus or
from internal processor bus.
The data and address lines are connected to the internal bus via
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Register
s:
The processor registers R0 to Rn-1 vary considerably from one

processor to another.
Registers are provided for general purpose used by
programmer.
Special purpose registers-index & stack registers.

Registers Y,Z &TEMP are temporary registers :used by

processor during the execution of some instruction.

Multiplexer:
Select either the output of the register Y or a constant value 4 to be

provided as input A of the ALU.

BITS Pilani, Deemed to be University under Section 3 of UGC
 Act, 1956
ALU:
Used to perform arithmetic and logical operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Register
Transfer
 The input and output
s gates for register Ri are
controlled by signals in
Rin and Riout .
 Rin Is set to1 – data
available on common
bus are loaded into Ri.
 Riout Is set to1 – the
contents of register are
placed on the bus.
 Riout Is set to 0 – the
bus can be used for
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Data transfer between two registers:
EX:

Transfer the contents of R1 to R4.

1. Enable output of register R1 by setting R1out=1. This places the

contents of R1 on the processor bus.

2. Enable input of register R4 by setting R4in=1. This loads the

data from the processor bus into register R4.

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Fetch
Sequence
(symbolic)
t1
:
MAR <- (PC)
MBR <-
t2 (memory)
:
PC <- (PC) +n
t3
: IR <- (MBR)
(tx = time unit/clock
cycle) or
t1 MAR <- (PC)
:
MBR <-
t2
(memory) PC
:
<- (PC) +n
t3
Assume
: IRn<-
byte
(MBR)
instruction
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Rules for Clock Cycle Grouping

Proper sequence must be followed

▪MAR <- (PC) must precede MBR <-
(memory) Conflicts must be avoided
▪Must not read & write same register at
same time
▪MBR <- (memory) & IR <- (MBR) must not be
in same cycle
Also: PC <- (PC) +n involves addition
▪Use ALU
▪May need additional micro-operations
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Sequence of Events, Fetch
Cycle

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Interrupt
Cycle
t1 MBR <-(PC)
:
MAR <- save-
t2 address
:
PC <- routine-
t3
address
:
This is a minimum
memory <- (MBR)
▪May be additional micro-ops to get addresses
▪N.B. saving context is done by interrupt handler routine,
not micro- ops

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
The Execute Cycle

Different for each instruction

e.g. ADD R1,X - add the contents of location X to
Register 1 , result in R1
t1: MAR <- (IRaddress)
t2: MBR <-
(memory) t3: R1 <-
R1 + (MBR)
Note no overlap of
micro-operations

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Performing an Arithmetic or Logic
Operation  The ALU is a combinational
R3=
R1+R2 circuit that has no internal
storage.
 ALU gets the two
operands from MUX and
bus. The result is
temporarily stored in
register Z.
 What is the sequence of
operations to add the
contents of register R1 to
those of R2 and store the
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956 result in R ?
Performing an Arithmetic or Logic
Operation Step 1: Output of the register
R3=
R1+R2 R1 and input of the register Y
are enabled, causing the
contents of R1 to be
transferred to Y.
Step 2: The multiplexer’s select
signal is set to select Y causing the
multiplexer to gate the contents of
register Y to input A of the ALU. At
the same time contents of R2 are
gated into the bus and hence to
input B
Step 3: The contents of Z are
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Register
Transfers
Input and output gating for one register
bit.
Rin clk Q

0 Q

Io Q
1 B
B

When Rout=1 B=Q.

otherwise in high
impedance state

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Fetching a Word from
Memory
. Connection and control signals for register MDR

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Instruction
Cycle
Each phase decomposed into sequence of elementary micro-
operations
E.g. fetch, indirect, and interrupt cycles Execute cycle
▪One sequence of micro-operations for each opcode
Assume new 2-bit register called ICC
▪Instruction cycle code (ICC) designates which part of cycle
processor is in
– 00: Fetch
– 01: Indirect
– 10: Execute
– 11: Interrupt

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Flowchart for
Instruction
Cycle

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Requirements for CONTROL OF THE
PROCESSOR
the following three-step process leads to a characterization of
the control unit
1. Define the basic elements of the processor.
2. Describe the micro-operations that the processor performs.
3. Determine the functions that the control unit must
perform to cause the micro-operations to be performed
The control unit performs two basic tasks:
•Sequencing: The control unit causes the processor to step
through a series of micro-operations in the proper sequence,
based on the program being executed.
•Execution: The control unit causes each micro-
operation to be performed
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Control Unit Clock
Block Diagram of the Control ▪One micro-instruction (or set of
Unit
parallel micro-instructions) per
clock cycle
Instruction register
▪Op-code for current instruction
▪Determines which micro-
instructions are performed
Flags
▪State of CPU
▪Results of previous operations
From control bus
▪Interrupts
▪Acknowledgements
BITS Pilani, Deemed to be University under Section 3
of UGC Act, 1956
Example Control Signal Sequence - Fetch

MAR <- (PC)

▪Control unit activates signal to open gates between PC
and MAR
MBR <- (memory)
▪Open gates between MAR and address bus
▪Memory read control signal
▪Open gates between data bus and MBR

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Data Paths and Control Signals

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Control Unit with Decoded
Inputs

BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee

Hardwired control
Signal
Let us consider a single control signal, C5. This signal causes data to be
read from the external data bus into the MBR
PQ = 11 Instruction C; PQ = 10 EC ;PQ = 01 Indirect Cycle; PQ =
00 FC

Then the following Boolean expression defines C5

This expression is not complete. C5 is also needed during the

execute cycle. For our simple example, let us assume that there
are only three instructions that read from memory: LDA, ADD, and
AND. Now we can define C5

This same process could be repeated for every control signal

generated by the processor

BITS Pilani, Deemed to be University under Section 3 of UGC

Hardwired
Implementation (1)

Control unit inputs

Flags and control bus

–Each bit means

something Instruction

– Decoder takes encoded input and produces single

output
– n binary inputs and 2n outputs
BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee
Hardwired
Implementation (2)
Clock
– Repetitive sequence of pulses

– Useful for measuring duration of micro-ops

– Must be long enough to allow signal propagation

– Different control signals at different times within

instruction cycle
– Need a counter with different control signals for t1, t2
etc.

BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee

Detailed Block
Description

BBIITTSS PPiillaannii,, DDeeeemmeedd ttoo bbee

Generating
Zin
Control sequence for an unconditional branch
instruction

Step Action
Add (R3),
1 PCout , MAR in , Read, Select4,Add, R1
2 Z in

3 Zout , PCin , Yin , WMF

4 C MDRout , IRin

5 Offset-field-of-IRout,

Add, Zin
Zout, PCin , End

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Generating
Zin

Branc Ad
h d

T4
T6

Zin = T1 + T6 • ADD + T4 • BR
+…
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Problems With Hard Wired
Designs
 Complex sequencing & micro-operation
logic
 Difficult to design and test
 Inflexible design
 Difficult to add new instructions

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Generating
End
End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) •
BRN + …

Control Step counter to its starting value

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Microprogrammed Control
Control signals are generated by a program similar to machine language
programs. Control Word (CW); Microroutine; individual
microinstruction

Add R1,
(R3)

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Micro-programmed Control Implementation
 Use sequences of instructions to control complex
operations Called micro-programming or firmware
 All the control unit does is generate a set of control
signals
 Each control signal is on or off
 Represent each control signal by a bit
 Have a control word for each micro-operation
 Have a sequence of control words for each machine
code
instruction
 Add an address to specify the next micro-
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-programmed Control Implementation

Today’s large microprocessor

▪Many instructions and associated register-level
hardware
▪Many control points to be
manipulated This results in
control memory that
▪Contains a large number of
words
– corresponding to the number of
instructions to be executed
▪Has a wide word width
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-
program
 Word Length Based on 3 factors
▪Maximum number of simultaneous micro-operations
supported
▪The way control information is represented or encoded
▪The way in which the next micro-instruction address is
specified
 Types
Each micro-instruction specifies single (or few) micro-
operations to be performed
▪ (vertical micro-programming)
Each micro-instruction specifies many different micro-
operations to be performed in parallel
▪(horizontal micro-programming)
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Micro-
program
 Vertical Micro-programming
 Width is narrow
 n control signals encoded into log2 n bits
 Limited ability to express parallelism
 Considerable encoding of control information
requires external memory word decoder to identify
the exact control line being manipulated
 Horizontal Micro-programming
 Wide memory word
 High degree of parallel operations possible
 Little encoding of control information

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Typical Microinstruction
Formats

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Micro-program
Organization ofControl
Memory
Compromise

 Divide control signals into

disjoint groups
 Implement each group
as separate field in
memory word
 Supports reasonable
levels of parallelism
without too much
complexity

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Execution of a Complete Instruction (self study)
 Add R1, (R3)
 Fetch the instruction
 Fetch the first operand (the contents of the
memory location pointed to by R3)
 Perform the addition
 Load the result into R1

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Execution of a Complete
Instruction Add R1,
(R3)

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Data Flow,
Fetch Cycle Add R1,
(R3)

WMFC: Wait for memory function check to accommodate the

variability in response time, The addressed device sets this
signal to 1 to indicate that contents of the specified location has
been read BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Example
Write the sequence of control steps needed to add the immediate
number to R1 as per the bus structure shown

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Example

1. PCout, MARin, Read, Select4, Add,

Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. PCout, MARin, Read, Select4, Add,
Zin
5. Zout, PCin, Yin
6. R1out, Yin, WMFC
7. MDRout, SelectY, Add, Zin
8. Zout, R1in, End

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Execute Cycle (ISZ)

•ISZ X - increment and skip if zero

▪t•The content
MAR of) location X is incremented by 1. If the re
<- (IRaddress
1:
is 0, the next <-
MBR
▪t
(memory) MBR
•instruction
2: is skipped. A possible sequence of micro-
<- (MBR) +is
▪toperations 1
▪3: memory <-
▪t
Notes: (MBR)

▪4:
if is a single if (MBR)
micro-
operation == 0
then PC
▪Micro-operations done
<- (PC)
during t4 + 1 BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execute Cycle (BSA)
BSA X - Branch and save address
▪Address of instruction following BSA is
saved in X
▪
▪Execution continues from X+1
t MAR <- (IRaddress)
▪1: MBR <- (PC)
▪t PC <- (IRaddress)
▪2: memory <-
▪t (MBR) PC <- (PC) + 1
3:
The address in the PC at the start of the instruction is the address of the next
instruction in sequence. This is saved at the address designated in the IR.
The latter address is also incremented to provide the address of the
instruction for the next instruction cycle
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Multiple-Bus
Organization
•Allow the contents of two
different registers to be
accessed simultaneously and
have their contents placed on
buses A and B.

•Allow the data on bus C to

be loaded into a third
register during the same
clock cycle.

• Incrementer unit.

•ALU simply passes one of

its two
input operands unmodified
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
to bus C
Add R6, R4, R5

Step Action

1 PCout, R=B, MAR in , Read,

2 IncPC
WMFC
3 MDR outB , R=B, IR
4 in SelectA, Add, R6in ,
R4outA,
End
R5outB,

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
What is Superscalar?
 Common instructions (arithmetic, load/store, conditional
branch) can be initiated and executed independently
 Equally applicable to RISC & CISC
 In practice usually RISC

In this example, two integer, two floating-point, and one memory (either
load or store) operations can be executing at the same time.

BITS Pilani, Deemed to be University under Section 3 of UGC

Superscalar v Superpipelined
FI DI EI WBSimple 4-stage pipeline

Superpipelined
 Many pipeline stages
need less than half a
clock cycle
 Double internal
Degree 2
clock speed gets two
tasks per
external clock cycle
 Superscalar allows
parallel fetch execute

BITS Pilani, Deemed to be University under Section 3 of UGC

Superscalar approach
Instruction level
parallelism Compiler
based optimisation
Hardware techniques
Limited by
▪True data dependency
▪Procedural
dependency
▪Resource conflicts
▪Output dependency
▪Antidependency BITS Pilani, Deemed to be University under Section 3 of UGC
Effect of Dependencies
 True Data Dependency
ADD r1, r2 (r1 :=
r1+r2;)
MOVE r3,r1 (r3 := r1;) Degree 2
 Can fetch and decode second
instruction in parallel with first
 Can NOT execute second
instruction until first is
finished
 Procedural
 Can not execute instructions
Dependency
after a branch in parallel with
instructions before a branch
 Also, if instruction length is not
fixed, instructions have to be
decoded to find out how many
fetches are needed
BITS Pilani, Deemed to be University under Section 3 of UGC
Effect of
Dependencies
 Resource Conflict
 Two or more instructions Degree 2

requiring access to the same

resource at
the same time
e.g. two arithmetic
instructions Can duplicate
resources
 e.g. have two arithmetic
units

BITS Pilani, Deemed to be University under Section 3 of UGC

Dependency due to memory Cycle
MOV EAX, ; load register EAX with the
eff
contents of effective memory
MOV EBX, address eff
EAX
 ;load EBX with the contents of
For RISC processor
 two or more cyclesEAX
when the load is a cache hit
 tens or even hundreds of cycles for a cache miss
 sol: for the compiler to reorder instructions
 This scheme is less effective in the case of a superscalar pipeline:
 The independent instructions executed during the load are likely to
be executed
on the first cycle of the load, leaving the processor with nothing to do
until the
load completes

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Design Issues:Instruction-Level Parallelism and Machine Pa
 Instruction level parallelism
 Instructions in a sequence are
independent
 Execution can be overlapped
 Governed by data and procedural
Example
dependency
Independent
Load R1 ← R2 Showing
Dependency Add
Add R3 ← R3, R4 ← R3, R2 Add
“1” Add R3 ← R4 ← R4, R2
Store [R4] ← R0
R3, “1”
 Machine Parallelism
 Ability to take advantage of instruction level
parallelism
 Governed by number of parallel pipelines
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Issue Policy

 Instruction issue policy to refer to the protocol

used to issue instructions
 Three types of orderings are important
 Order in which instructions are fetched
 Order in which instructions are executed
 Order in which instructions change
registers and memory
 the processor must accommodate the
various
dependencies and conflicts.
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
superscalar instruction issue policies
 In-order issue with in-order completion
 In-order issue with out-of-order
completion
 Out-of-order issue with out-of-order
completion

In-Order Issue In-Order Completion

 Issue instructions in the order they
occur
 Not very efficient
 May fetch >1 instruction
 Instructions must stall if necessary

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
In-Order Issue In-Order Completion
(Diagram)
 I1 requires two cycles two execute
 I3 & I4 conflict for the same
functional unit
 I5 depends on the value produced
by I4
 I5 and I6 conflict for a functional unit

Superscalar Instruction Issue and Completion

Policies BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
In-Order Issue Out-of-Order
Completion (Diagram)

Superscalar Instruction Issue and Completion

Policies

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
In-Order Issue Out-of-Order
Completion
Output dependency
▪ I1: R3 ← R3 op R5
▪I2: R4 ← R3 + 1
▪ I3: R3 ← R5 + 1
▪I4: R7 ← R3 op R4
▪ I2 depends on result of I1 – True data dependency: Instruction
I2 cannot execute before instruction I1
▪If I3 completes before I1, the result from I1 will be wrong -
output (read-write)
dependency
▪I4 must wait for I3, because it uses a result pro duced by I3.
▪ I3 must complete after I1 to produce the correct output values
in I4
▪Eg. Of a Write after write (WAW) dependency.
▪ Here more difficult to deal with instruction interrupts and exceptions
as at the time of interruption, instructions
BITS Pilani, Deemed ahead
to be University of the
under Section instruction
3 of UGC
Act, 1956
Out-of-Order Issue Out-of-Order
Completion

 Decouple decode pipeline from execution pipeline

 Can continue to fetch and decode until this pipeline is full
 When a functional unit becomes available an
instruction can be executed
 Since instructions have been decoded, processor can
look ahead

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Antidependency
Read after-write(RAW) dependency
I1: R3 ← R3 op
R5 I2: R4 ← R3
+ 1 I3: R3 ← R5
+ 1 I4: R7 ← R3
op R4
 Instruction I3 cannot complete execution before
instruction I2 begins execution and has fetched its
operands. This is so because I3 updates register R3,
which is a source operand for I2.
 The term antidependency is used because the
constraint is similar to that of a true data
BITS Pilani, Deemed to be University under Section 3 of UGC
dependency, but reversed: Act, 1956
Reduced Instruction Set
Computer
 Key features
 Large number of general purpose registers
 or use of compiler technology to optimize register use
 Limited and simple instruction set
 Emphasis on optimising the instruction pipeline
 All operations on data apply to data in registers
 The only operations that affect memory are
load and store operations
 The instruction formats are few in number, with
all instructions typically being one size.

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Basic Pipelining: A "Typical" RISC IS
 32-bit fixed format instruction (3 formats)
ALU instructions, Load/Store, Branches and Jumps
 32 32-bit GPR (R0 contains zero, DP take pair)
 3-address, reg-reg arithmetic instruction
 Single address mode for load/store: base +
displacement
• – no indirection
 Simple branch conditions /Jump
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
A simple implementation of a RISC Instruct
Set Every instruction can be implemented in at most 5 cycles.
▪Instruction fetch cycle (IF): send PC to memory and fetch the current
instruction; Update the PC to the next PC by adding 4.
▪Instruction decode/register fetch cycle (ID): decode the instruction and
read the registers. For a branch, do the equality test on the
registers, and compute the target address
▪Execution/effective address cycle (EX): The ALU operates on the
operands prepared in the prior cycle, performing:
– Memory reference: ALU adds the base register and the offset.
LW 100(R1) => 100 + R1
– Register-Register ALU instruction: ALU performs the operation
specified by the opcode on the values read from the register file.
ADD R1,R2,R3 => R1 + R2
– Register-Immediate ALU instruction: ALU performs the operation
specified by the opcode on the first values read from the register file
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
A simple implementation of a RISC
Instruction Set
Every instruction can be implemented in at most 5 cycles.
▪Memory access (MEM). Read the memory using the
effective address computed in the previous
cycle if the instruction is a load; If it is a store, write
the data to the memory
▪Write-back cycle (WB). For Register-Register ALU
instruction or
load instruction, write the result into the register file,
whether it comes from the memory (for a load) or from
the ALU (for an ALU instruction)
Branch instructions require 2 cycles, store instructions
require 4 cycles, others require 5 cycles.

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
An example of an instruction
execution
ADD R3, R1, => R3 = R1 +
R2 R2;

U
AL
P
C

Reg File
ory
Mem

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Instruction Fetch (IF)

ADD R3, R1, => R3 = R1 +

R2 R2;

U
AL
P
C 000001000010001000011
…
00001100011111110000

Reg File
0…
y
Memor

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;

U
AL
P
C 000001000010001000011
…
00001100011111110000

Reg File
0…
y
Memor

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…

P
C 000001000010001000011
…
00001100011111110000

Reg File
0…
y
Memor
Fetch the
instruction

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Instruction Fetch (IF)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…

000001000010001000011
P …
C 00001100011111110000

Reg File
0…
Fetch the
y
Memor

instruction Move PC
to the next
instruction

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…
Reg
File
0
R
x
0
y

g e
Re Fil
R
z
1
ory
Mem

R
R3
2 …
1
R
3 BITS Pilani, Deemed to beAct,
University under Section 3 of UGC
1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…
Reg
File
ADD R 0
0 x
R y

g e
Re Fil
decode the
1 z instruction
ory
Mem

R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…
Reg
File
ADD R 0
R1
0 x
R y

g e
Re Fil
decode the
1 z instruction
ory
Mem

R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…
Reg
File
ADD R1 R 0
R2
0 x
R y

g e
Re Fil
decode the
1 z instruction
ory
Mem

R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;

IR
000001000010001000011

U
AL
…
Reg File

ADD R1 R2 R3 R0 0
R x
1 y

g e
Re Fil
R z
ory
Mem

2
R
R3 …
13

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Instruction Decode (ID)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011

U
AL
…
y
Reg
File
ADD R1 R2 R3 0
R0
x
R
y

g e
Re Fil
1
z
R
ory
Mem

2
R3
R …
1
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Execution (EX)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011

U
AL
…
y

Reg File
ory
Mem

BITS Pilani, Deemed to be University under Section 3 of UGC

Act, 1956
Execution (EX)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+

U
AL
… y
y

Execute the
operation

Reg File
ory
Mem

BITS Pilani, Deemed to be University under Section 3

of UGC Act, 1956
Writeback (WB)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+

U
AL
… y
y
Reg
File
R 0
0 x
R y Write the result to the

g e
Re Fil
register
1 z
ory
Mem

R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956
Writeback (WB)
ADD R3, R1, => R3 = R1 +
R2 R2;
x
IR
000001000010001000011 x+

U
AL
… y
y
Reg
File
R 0
0 x
R y Write the result to the

g e
Re Fil
register
1 x+y
ory
Mem

R
2
R3 …
1R
3
BITS Pilani, Deemed to be University under Section 3 of UGC
Act, 1956

Computer Architecture CS F342 Ca-Lect1
No ratings yet
Computer Architecture CS F342 Ca-Lect1
21 pages
Comp Arch Ch1 Ch2 Ch3 Ch4
No ratings yet
Comp Arch Ch1 Ch2 Ch3 Ch4
161 pages
Microprocessor System - Prelim
No ratings yet
Microprocessor System - Prelim
7 pages
Computer Organization and Architecture Vishnu COA1
No ratings yet
Computer Organization and Architecture Vishnu COA1
136 pages
Computer Architecture and Organization: General Introduction
No ratings yet
Computer Architecture and Organization: General Introduction
72 pages
Module 1a A Brief History of Computer Architecture
No ratings yet
Module 1a A Brief History of Computer Architecture
53 pages
L-1 (History of Computer)
No ratings yet
L-1 (History of Computer)
75 pages
Contact: Computer Organization and Architecture
No ratings yet
Contact: Computer Organization and Architecture
14 pages
Arch Leacture 1
No ratings yet
Arch Leacture 1
52 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
86 pages
CH012 COA9e Modified
No ratings yet
CH012 COA9e Modified
51 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
61 pages
Introduction To COmputer Architecture
No ratings yet
Introduction To COmputer Architecture
20 pages
Chapter 1 Introduction To COA
No ratings yet
Chapter 1 Introduction To COA
65 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
61 pages
Introduction To Computer and C Programming Topic - 1 Computer Evolution and Performance
No ratings yet
Introduction To Computer and C Programming Topic - 1 Computer Evolution and Performance
16 pages
1 Introduction
No ratings yet
1 Introduction
31 pages
Merged Slides
No ratings yet
Merged Slides
326 pages
Module 1A (19-10-2024)
No ratings yet
Module 1A (19-10-2024)
62 pages
Computer Architecture History
No ratings yet
Computer Architecture History
75 pages
Introduction To Computer Architecture: Instructor
No ratings yet
Introduction To Computer Architecture: Instructor
46 pages
Introduction To Computer Architecture: Instructor
No ratings yet
Introduction To Computer Architecture: Instructor
46 pages
2 Introduction 11 Jul 2019material I 11 Jul 2019 Module 1
No ratings yet
2 Introduction 11 Jul 2019material I 11 Jul 2019 Module 1
54 pages
Slide CA Chapter 1 Lecture 1
No ratings yet
Slide CA Chapter 1 Lecture 1
51 pages
Computer Organization and
No ratings yet
Computer Organization and
50 pages
BCS1373 - Chap 1 Intro
No ratings yet
BCS1373 - Chap 1 Intro
42 pages
CPE132 Computer Architecture
No ratings yet
CPE132 Computer Architecture
52 pages
Lect 01 PDF
No ratings yet
Lect 01 PDF
33 pages
Chapter 1 - Basic Concepts and Computer Evolution
No ratings yet
Chapter 1 - Basic Concepts and Computer Evolution
23 pages
Lecture 1 Introduction To Computer Architecture and Organization
No ratings yet
Lecture 1 Introduction To Computer Architecture and Organization
69 pages
Brief History of Computer Evolution
No ratings yet
Brief History of Computer Evolution
13 pages
Microprocessor System - Prelim
No ratings yet
Microprocessor System - Prelim
7 pages
Week-1 (History of Computer)
No ratings yet
Week-1 (History of Computer)
77 pages
Lec01 - Introduction
No ratings yet
Lec01 - Introduction
45 pages
Chapter 1
No ratings yet
Chapter 1
59 pages
Computer Organization and Architecture
0% (1)
Computer Organization and Architecture
49 pages
Computer Architecture and Organization Reviewer
No ratings yet
Computer Architecture and Organization Reviewer
14 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
56 pages
Compacrchlec Prelims
No ratings yet
Compacrchlec Prelims
13 pages
Lecture1 Orkom 19
No ratings yet
Lecture1 Orkom 19
33 pages
Computer Organization Computer Architecture
No ratings yet
Computer Organization Computer Architecture
23 pages
Introduction To Computer Architecture
No ratings yet
Introduction To Computer Architecture
17 pages
MS Computer Architecture Course
No ratings yet
MS Computer Architecture Course
49 pages
Computer Architecture NCS-505
No ratings yet
Computer Architecture NCS-505
55 pages
Summary
No ratings yet
Summary
6 pages
Computer Architecture Essentials
No ratings yet
Computer Architecture Essentials
80 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
10 pages
Chapter - 1 DTT 210
No ratings yet
Chapter - 1 DTT 210
55 pages
Computer Architecture Course Guide
No ratings yet
Computer Architecture Course Guide
109 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
26 pages
Chapter 1 Computer Architecture and Organization
No ratings yet
Chapter 1 Computer Architecture and Organization
116 pages
Chap 01 - Intro To Computer Systems
No ratings yet
Chap 01 - Intro To Computer Systems
81 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
33 pages
Coa Notes 2018
No ratings yet
Coa Notes 2018
262 pages
CSPC30 Prelim Reviewer
No ratings yet
CSPC30 Prelim Reviewer
6 pages
Lec 1
No ratings yet
Lec 1
27 pages
Unit I
No ratings yet
Unit I
13 pages
Lecture 03, Computer Architecture, CS 252
No ratings yet
Lecture 03, Computer Architecture, CS 252
32 pages
I Week III Von Neumann Structure, Expanded Structure of IAS Computer
No ratings yet
I Week III Von Neumann Structure, Expanded Structure of IAS Computer
62 pages
Operation Instructions of Grinding Plant
100% (2)
Operation Instructions of Grinding Plant
389 pages
Become A Local Seller On Daraz
100% (1)
Become A Local Seller On Daraz
19 pages
FINAL ASSIGNMENT 3 Swarali Final
No ratings yet
FINAL ASSIGNMENT 3 Swarali Final
11 pages
Hurp
No ratings yet
Hurp
3 pages
Simple Show - BluffTitler
No ratings yet
Simple Show - BluffTitler
3 pages
Product Lifecycle and End of Life Information For Broadcom, Symantec, and VMware Products
No ratings yet
Product Lifecycle and End of Life Information For Broadcom, Symantec, and VMware Products
2 pages
Enhancing Data Management: An Integrated Solution For Database Backup, Recovery, Conversion, and Encryption Capabilities
No ratings yet
Enhancing Data Management: An Integrated Solution For Database Backup, Recovery, Conversion, and Encryption Capabilities
15 pages
KEMROC New Products (EN)
No ratings yet
KEMROC New Products (EN)
5 pages
Mixer b20c
No ratings yet
Mixer b20c
8 pages
9th Maths Minor Test 3
No ratings yet
9th Maths Minor Test 3
4 pages
Heat Exchangers 304 SS: CEPCI (2010) 532.9 CEPCI (2020) 599.5
No ratings yet
Heat Exchangers 304 SS: CEPCI (2010) 532.9 CEPCI (2020) 599.5
6 pages
Data Structures Question Bank R20-II EEE
No ratings yet
Data Structures Question Bank R20-II EEE
5 pages
Arduino Rocket Flight Computer
No ratings yet
Arduino Rocket Flight Computer
7 pages
SATIP E 004 01 Heat Exchanger
No ratings yet
SATIP E 004 01 Heat Exchanger
10 pages
ERPCODE Inc
No ratings yet
ERPCODE Inc
5 pages
Tutorial de OpenFOAM Con Ubuntu Linux 12.04
No ratings yet
Tutorial de OpenFOAM Con Ubuntu Linux 12.04
68 pages
COMM 337U - Communication and Gender: Syllabus - Portland State University - Spring 2011 - Tierney
No ratings yet
COMM 337U - Communication and Gender: Syllabus - Portland State University - Spring 2011 - Tierney
6 pages
Operating Systems for Beginners
No ratings yet
Operating Systems for Beginners
7 pages
Maintenance Manual of Chery A113 - Chassis PDF
50% (2)
Maintenance Manual of Chery A113 - Chassis PDF
45 pages
PaperCut MF - Fuji Xerox Embedded Manual - 2020-05-15
No ratings yet
PaperCut MF - Fuji Xerox Embedded Manual - 2020-05-15
46 pages
Urea Prilling Tower Design
100% (8)
Urea Prilling Tower Design
5 pages
2011 PP
100% (1)
2011 PP
56 pages
Change Your Password
No ratings yet
Change Your Password
1 page
De La Salle University: Canvas Helps Reimagine and Improve Teaching and Learning
No ratings yet
De La Salle University: Canvas Helps Reimagine and Improve Teaching and Learning
2 pages
Design and Structural Analysis of Car Alloy Wheel Using With Various Materials
No ratings yet
Design and Structural Analysis of Car Alloy Wheel Using With Various Materials
7 pages
Process Domain 2 - Procurement
100% (1)
Process Domain 2 - Procurement
56 pages
Negotiation 8th Edition Roy Lewicki David Saunders Bruce Barry Official Test Bank
No ratings yet
Negotiation 8th Edition Roy Lewicki David Saunders Bruce Barry Official Test Bank
326 pages
H2 - Schedule - 6 - Rolling Shutter
No ratings yet
H2 - Schedule - 6 - Rolling Shutter
10 pages
Opt 4001
No ratings yet
Opt 4001
45 pages
1TL0001
No ratings yet
1TL0001
28 pages