William Stallings
Computer Organization
and Architecture
8th Edition
Chapter 1
Basic Concepts and Computer Evolution
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Lecture Outcomes
Understanding of:
• Architecture & Organization
• Concept of computer as a hierarchical system using structure and
functions.
• The functions of each component in a computer
Computer Architecture Computer Organization
Architecture & Organization
• Computer Architecture refers to those attributes of a system visible to
the programmer or, put another way, those attributes that have a direct
impact on the logical execution of a program
– Instruction set, number of bits used for data representation, I/O mechanisms,
addressing techniques.
– A term that is often used interchangeably with computer architecture is in-
struction set architecture (ISA)
• Computer organization refers to the operational units and their intercon-
nections that realize the architectural specifications.
– Organizational attributes include those hardware details transparent to the
programmer, such as control signals; interfaces between the computer and
peripherals; and the memory technology used.
IBM System 370 Architecture
• All Intel x86 family share the same basic architecture
• The IBM System/370 family share the same basic architecture
Organization differs between different versions
• IBM System/370 architecture
– Was introduced in 1970
– Included a number of models
– Could upgrade to a more expensive, faster model without having to abandon ori-
ginal software
– New models are introduced with improved technology, but retain the same architec-
ture so that the customer’s software investment is protected
– Architecture has survived to this day as the architecture of IBM’s mainframe product
line
Structure & Function
Hierarchical system
Set of interrelated subsystems
Structure
Hierarchical nature of complex The way in which components
systems is essential to both their relate to each other
design and their description
Function
Designer need only deal with a The operation of individual
particular level of the system at a components as part of the
time structure
Concerned with structure and
function at each level
Function
There are four basic functions that a computer can perform:
Data processing
Data may take a wide variety of forms and the range of processing requirements
is broad
Data storage
Short-term
Long-term
Data movement
Input-output (I/O) - when data are received from or delivered to a device
(peripheral) that is directly connected to the computer
Data communications – when data are moved over longer distances, to or from a
remote device
Control
A control unit manages the computer’s resources and orchestrates the
performance of its functional parts in response to instructions
Structure
Structure
CPU – controls the operation of the computer and performs
its data processing functions
There are four
Main Memory – stores data
main structural
components I/O – moves data between the computer and its external
environment
of the computer
System Interconnection – some mechanism that provides
for communication among CPU, main memory, and I/O
Structure
Control Unit
Controls the operation of the CPU and hence the
computer
Arithmetic and Logic Unit (ALU)
Major structural Performs the computer’s data processing function
components: Registers
Provide storage internal to the CPU
CPU Interconnection
Some mechanism that provides for communication
among the control unit, ALU, and registers
Multicore Computer Structure
Central processing unit (CPU)
Portion of the computer that fetches and executes instructions
Consists of an ALU, a control unit, and registers
Referred to as a processor in a system with a single processing unit
Core
An individual processing unit on a processor chip
May be equivalent in functionality to a CPU on a single-CPU system
Specialized processing units are also referred to as cores
Processor
A physical piece of silicon containing one or more cores
Is the computer component that interprets and executes instructions
Referred to as a multicore processor if it contains multiple cores
Cache Memory
Multiple layers of memory between the processor and main memory
Is smaller and faster than main memory
Used to speed up memory access by placing in the cache data from
main memory that is likely to be used in the near future
A greater performance improvement may be obtained by using
multiple levels of cache, with level 1 (L1) closest to the core and
additional levels (L2, L3, etc.) progressively farther from the core
Motherboard with Two Intel Quad-Core Xeon Processors
Same Architecture Different Microarchitecture
AMD Phenom X4 Intel Atom
•X86 Instruction Set •X86 Instruction Set
•Quad Core •Single Core
•125W
•2W
•Decode 3 Instructions/Cycle/Core
•64KB L1 I Cache, 64KB L1 D Cache •Decode 2 Instructions/Cycle/Core
•512KB L2 Cache •32KB L1 I Cache, 24KB L1 D Cache
•Out-of-order •512KB L2 Cache
•2.6GHz •In-order
•1.6GHz
First Generation: ENIAC and EDVAC
EDVAC
Structure of von Neumann machine
IAS Memory Format
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
(MBR) • Or is used to receive a word from memory or from the I/O unit
Memory address register • Specifies the address in memory of the word to be written from or read into the
(MAR) MBR
Instruction register (IR) • Contains the 8-bit opcode instruction being executed
Instruction buffer register • Employed to temporarily hold the right-hand instruction from a word in memory
(IBR)
Program counter (PC) • Contains the address of the next instruction pair to be fetched from memory
Accumulator (AC) and • Employed to temporarily hold operands and results of ALU operations
multiplier quotient (MQ)
Structure of IAS
The IAS Instruction Set
Table 1.1
The IAS
Instruction Set
IAS Machine. Design directed
by John von Neumann.
First booted in Princeton NJ
in 1952
Smithsonian Institution Ar-
chives (Smithsonian Image
95-06151)
Second Generation: Transistors
Computer Generations
Second Generation Computers
Introduced:
More complex arithmetic and logic units and control units
The use of high-level programming languages
Provision of system software which provided the ability to:
Load programs
Move data to peripherals
Libraries perform common computations
Third Generation: Integrated Circuits
1958 – the invention of the integrated circuit
Discrete component
Single, self-contained transistor
Manufactured separately, packaged in their own containers, and soldered or wired
together onto masonite-like circuit boards
Manufacturing process was expensive and cumbersome
The two most important members of the third generation were the IBM
System/360 and the DEC PDP-8
Integrated Circuits
A computer consists of gates, memory cells, and
Data storage – provided by interconnections among these elements
memory cells
The gates and memory cells are constructed of
Data processing – provided by simple digital electronic components
gates
Exploits the fact that such components as
Data movement – the paths among transistors, resistors, and conductors can be
components are used to move data fabricated from a semiconductor such as silicon
from memory to memory and from
memory through gates to memory Many transistors can be produced at the same time
on a single wafer of silicon
Control – the paths among Transistors can be connected with a processor
components can carry control metallization to form circuits
signals
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every year
• Since 1970’s development has slowed a little
– Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical paths, giving higher
performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
Announced in 1964
Product line was incompatible with older IBM machines
Was the success of the decade and cemented IBM as the overwhelmingly
dominant computer vendor
The architecture remains to this day the architecture of IBM’s mainframe
computers
Was the industry’s first planned family of computers
Models were compatible in the sense that a program written for one model should be
capable of being executed by another model in the series
Family Characteristics
Similar or identi-
Similar or identi-
cal operating sys- Increasing speed
cal instruction set
tem
Increasing number Increasing mem-
Increasing cost
of I/O ports ory size
DEC - PDP-8 Bus Structure
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor memory
Chip was about the size of a Could hold 256 bits of
Non-destructive Much faster than core
single core memory
Semiconductor Memory
In 1974 the price per bit of semiconductor memory dropped below the price per bit of core memory
There has been a continuing and rapid decline in memory
Developments in memory and processor technologies
cost accompanied by a corresponding increase in physical
changed the nature of computers in less than a decade
memory density
Since 1970 semiconductor memory has been through 13 generations
Each generation has provided four times the storage density of the previous generation, accompanied by declining cost
per bit and declining access time
Microprocessors
The density of elements on processor chips continued to rise
More and more elements were placed on each chip so that fewer and fewer chips were
needed to construct a single computer processor
1971 Intel developed 4004
First chip to contain all of the components of a CPU on a single chip
Birth of microprocessor
1972 Intel developed 8008
First 8-bit microprocessor
1974 Intel developed 8080
First general purpose microprocessor
Faster, has a richer instruction set, has a large addressing capability
Evolution of Intel Microprocessors
(a) 1970s Processors
Evolution of Intel Microprocessors
(b) 1980s Processors
Evolution of Intel Microprocessors
(c) 1990s Processors
Evolution of Intel Microprocessors
(d) Recent Processors
Intel Microprocessor Performance
The Evolution of the Intel x86 Architecture
Highlights of the Evolution of the Intel Product Line:
8080 8086 80286 80386 80486
• World’s first • A more powerful • Extension of the • Intel’s first 32-bit • Introduced the
general-purpose 16-bit machine 8086 enabling machine use of much
microprocessor • Has an instruc- addressing a 16- • First Intel pro- more sophisti-
• 8-bit machine, 8- tion cache, or MB memory in- cessor to sup- cated and pow-
bit data path to queue, that stead of just port multitasking erful cache
memory prefetches a few 1MB technology and
• Was used in the instructions be- sophisticated in-
first personal fore they are struction pipelin-
computer (Altair) executed ing
• The first ap- • Also offered a
pearance of the built-in math co-
x86 architecture processor
• The 8088 was a
variant of this
processor and
used in IBM’s
first personal
computer (secur-
ing the success
of Intel
Embedded Systems ARM
Embedded Systems Requirements
• Different sizes
– Different constraints, optimization, reuse
• Different requirements
– Safety, reliability, real-time, flexibility, legislation
– Lifespan
– Environmental conditions
– Static v dynamic loads
– Slow to fast speeds
– Computation v I/O intensive
– Descrete event v continuous dynamics
Possible Organization of an Embedded System
ARM Evolution
ARM Systems Categories
Performance Assessment Clock Speed
• Key parameters
– Performance, cost, size, security, reliability, power consumption
• System clock speed
– In Hz or multiples of
– Clock rate, clock cycle, clock tick, cycle time
• Signals in CPU take time to settle down to 1 or 0
• Signals may change at different speeds
• Operations need to be synchronised
• Instruction execution in discrete steps
– Fetch, decode, load and store, arithmetic or logical
– Usually require multiple clock cycles per instruction
• Pipelining gives simultaneous execution of instructions
• So, clock speed is not the whole story
System Clock
Instruction Execution Rate
Instruction Execution Rate
• Millions of instructions per second (MIPS)
• Millions of floating point instructions per
second (MFLOPS)
• Heavily dependent on instruction set,
compiler design, processor implementa-
tion, cache & memory hierarchy
Benchmarks
• Programs designed to test performance
• Written in high level language
– Portable
• Represents style of task
– Systems, numerical, commercial
• Easily measured
• Widely distributed
• E.g. System Performance Evaluation Corporation (SPEC)
– CPU2006 for computation bound
• 17 floating point programs in C, C++, Fortran
• 12 integer programs in C, C++
• 3 million lines of code
– Speed and rate metrics
• Single task and throughput
SPEC Speed Metric
• Single task
• Base runtime defined for each benchmark using refer-
ence machine
• Results are reported as ratio of reference time to system
run time
– Trefi execution time for benchmark i on reference machine
– Tsuti execution time of benchmark i on test system
• Overall performance calculated by averaging ratios for all 12 in-
teger benchmarks
— Use geometric mean
– Appropriate for normalized numbers such as ratios
SPEC Rate Metric
• Measures throughput or rate of a machine carrying out a number of
tasks
• Multiple copies of benchmarks run simultaneously
– Typically, same as number of processors
• Ratio is calculated as follows:
– Trefi reference execution time for benchmark i
– N number of copies run simultaneously
– Tsuti elapsed time from start of execution of program on all N pro-
cessors until completion of all copies of program
– Again, a geometric mean is calculated
Amdahl’s Law
• Gene Amdahl [AMDA67]
• Potential speed up of program using multiple processors
• Concluded that:
– Code needs to be parallelizable
– Speed up is bound, giving diminishing returns for more pro-
cessors
• Task dependent
– Servers gain by maintaining multiple connections on multiple
processors
– Databases can be split into parallel tasks
Amdahl’s Law Formula
• For program running on single processor
— Fraction f of code infinitely parallelizable with no scheduling overhead
— Fraction (1-f) of code inherently serial
— T is total execution time for program on single processor
— N is number of processors that fully exploit parralle portions of code
• Conclusions
– f small, parallel processors has little effect
– N ->∞, speedup bound by 1/(1 – f)
• Diminishing returns for using more processors
Addressing Modes: How to Get Operands from Memory
Data Types and Sizes
Types
–Binary Integer
–Binary Coded Decimal (BCD)
–Floating Point
•IEEE 754
•Cray Floating Point
•Intel Extended Precision (80-bit)
–Packed Vector Data
–Addresses
•Width
–Binary Integer (8-bit, 16-bit, 32-bit, 64-bit)
–Floating Point (32-bit, 40-bit, 64-bit, 80-bit)
–Addresses (16-bit, 24-bit, 32-bit, 48-bit, 64-bit)
ISA Encoding
Fixed Width: Every Instruction has same width
•Easy to decode
(RISC Architectures: MIPS, PowerPC, SPARC, ARM…)
Ex: MIPS, every instruction 4-bytes
Variable Length: Instructions can vary in width
•Takes less space in memory and caches
(CISC Architectures: IBM 360, x86, Motorola 68k, VAX…)
Ex: x86, instructions 1-byte up to 17-bytes
Mostly Fixed or Compressed:
•Ex: MIPS16, THUMB (only two formats 2 and 4 bytes)
•PowerPC and some VLIWs (Store instructions compressed, decompress into Instruc-
tion Cache
(Very) Long Instruction Word:
•Multiple instructions in a fixed width bundle
•Ex: Multiflow, HP/ST Lx, TI C6000
x86 (IA-32) Instruction Encoding
x86 and x86-64 Instruction Formats Possible instructions 1 to 18
bytes long
MIPS64 Instruction Encoding
ARM ARCHITECTURE
ARM is a family of RISC- based microprocessors and microcontrollers designed by ARM Holdings,
Cambridge, England. The company doesn’t make processors but instead designs microprocessor
and multicore architectures and licenses them to manufacturers. ARM chips are high- speed proces-
sors that are known for their small die size and low power requirements.
• cortex-a The Cortex-A series of processors are application processors, intended for
mobile devices such as smartphones and eBook readers
• cortex-r The Cortex-R is designed to support real- time applications, in which the timing of
events needs to be controlled with rapid response to events.
• cortex-m series processors have been developed primarily for the microcontroller domain
where the need for fast, highly deterministic interrupt management is coupled with the desire for
extremely low gate count and lowest possible power consumption.
Recap
Review Questions
What, in general terms, is the distinction between computer organization
and computer architecture?
What, in general terms, is the distinction between computer structure and
computer function?
List and briefly define the main structural components of a computer.
List and briefly define the main structural components of a processor.
What is a stored program computer?
What are the three principal constituents of a computer system?
Explain Moore’s law.
List and explain the key characteristics of a computer family.
Classes of Instructions
• Data Transfer
–LD, ST, MFC1, MTC1, MFC0, MTC0
•ALU
–ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI
•Control Flow
–BEQZ, JR, JAL, TRAP, ERET
•Floating Point
–ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W,
•Multimedia (SIMD)
–ADD.PS, SUB.PS, MUL.PS, C.LT.PS
•String
–REP MOVSB (x86)
Thank you