Computer Architecture and Organization
[ECEg - 4163]
Chapter One:
Overview of Computer Architecture and
Organization
Prepared by Amanuel Z. & Satenaw S.
Outline
Basic Concepts and Computer Evolution
Performance
Computer system
1.1 Basic Concepts and Computer Evolution
1.1.1 Organization and Architecture
Computer architecture
Refers to those attributes of a system visible to a programmer or,
Those attributes that have a direct impact on the logical execution of a
program.
It defines:
➔
Instruction sets
➔
Data representation
➔
Techniques for addressing memory
➔
I/O mechanisms
3
Cont’d...
Computer organization refers to the operational units and their
interconnections that realize the architectural specifications.
➔
Control signals;
➔
Interfaces between the computer and peripherals; and
➔
The memory technology used.
4
Cont’d...
IBM System/370 Architecture
Was introduced in 1970
Included a number of models
Could upgrade to a more expensive, faster model without having to
abandon original software
New models are introduced with improved technology, but retain the same
architecture so that the customer’s software investment is protected
Architecture has survived to this day as the architecture of IBM’s
mainframe product line
5
1.1.2 Structure and Function
A computer is a complex system; contemporary computers contain
millions of elementary electronic components.
How can one clearly describe them?
The key to clearly describe them is to recognize the hierarchical nature of
most complex systems, including the computer [SIMO96].
A hierarchical system is a set of interrelated subsystems, each of the latter,
in turn, hierarchical in structure until we reach some lowest level of
elementary subsystem.
The hierarchical nature of complex systems is essential to both their design
and their description.
6
Cont’d…
The designer need only deal with a particular level of the system at a time.
At each level, the system consists of a set of components and their
interrelationships.
The behavior at each level depends only on a simplified, abstracted
characterization of the system at the next lower level.
At each level, the designer is concerned with structure and function:
➔
Structure: The way in which the components are interrelated.
➔
Function: The operation of each individual component as part of the
structure.
7
Cont’d…
In terms of description, we have two choices:
➔
Starting at the bottom and building up to a complete description, or
➔
Beginning with a top view and decomposing the system into its sub
parts.
Evidence from a number of fields suggests that the top down approach is
the clearest and most effective.
8
Cont’d...
Function
There are four basic functions that a computer can perform:
Data processing: Data may take a wide variety of forms and the range of processing
requirements is broad
Data storage: Short-term/Long-term
Data movement
➔
Input-output (I/O) - when data are received from or delivered to a device (peripheral)
that is directly connected to the computer
➔
Data communications – when data are moved over longer distances, to or from a
remote device
Control
➔
A control unit manages the computer’s resources and orchestrates the performance of
its functional parts in response to instructions
9
Cont’d…
Operating environment (source and destination of data)
Figure 1.1 depicts the basic
functions that a computer can
perform.
Figure 1.1 A Functional View of the Computer 10
Cont’d…
Figure 1.2 depicts the four possible types of operations.
Figure 1.2 Possible Computer Operations 11
Cont’d…
Figure 1.2 Possible Computer Operations 12
Cont’d…
Structure:
Figure 1.3 is the simplest possible depiction of a computer.
The computer interacts in some fashion with its external environment.
All of its linkages to the external
environment can be classified as
peripheral devices or communication
lines.
Figure 1.3 The Computer
13
Cont’d…
The greater concern in this
course is the internal structure
of the computer itself, which
is shown in Figure 1.4.
Figure 1.4 The Computer: Top-
Level Structure
14
Cont’d…
There are four main structural components:
Central processing unit (CPU): Controls the operation of the computer and
performs its data processing functions; often simply referred to as processor.
Main memory: Stores data.
I/O: Moves data between the computer and its external environment.
System interconnection: Some mechanism that provides for communication
among CPU, main memory, and I/O.
➔
A common example of system interconnection is by means of a system bus,
consisting of a number of conducting wires to which all the other
components attach.
15
Cont’d…
CPU
Its major structural components are as follows:
Control unit: Controls the operation of the CPU and hence the computer.
Arithmetic and logic unit (ALU): Performs the computer’s data
processing functions.
Registers: Provides storage internal to the CPU.
CPU interconnection: Some mechanism that provides for communication
among the control unit, ALU, and registers.
16
Cont’d…
Multicore Computer
Structure
Figure 1.5 Simplified View of Major Elements of a Multicore Computer 17
Cont’d…
Central processing unit (CPU)
➔
Portion of the computer that fetches and executes instructions
➔
Consists of an ALU, a control unit, and registers
➔
Referred to as a processor in a system with a single processing unit
Core
➔
An individual processing unit on a processor chip
➔
May be equivalent in functionality to a CPU on a single-CPU system
➔
Specialized processing units are also referred to as cores
Processor
➔
A physical piece of silicon containing one or more cores
➔
Is the computer component that interprets and executes instructions
➔
Referred to as a multicore processor if it contains multiple cores
18
Cont’d…
Cache Memory
Multiple layers of memory between the processor and main memory
Is smaller and faster than main memory
Used to speed up memory access by placing in the cache data from main
memory that is likely to be used in the near future
A greater performance improvement may be obtained by using multiple
levels of cache, with level 1 (L1) closest to the core and additional levels
(L2, L3, etc.) progressively farther from the core
19
1.1.3 A Brief History of Computers
The First Generation:Vacuum Tubes
Vacuum tubes were used for digital logic elements and memory
IAS computer
➔
Fundamental design approach was the stored program concept
✔
Attributed to the mathematician John von Neumann
✔
First publication of the idea was in 1945 for the EDVAC
➔
In 1946 design began at the Princeton Institute for Advanced Studies
➔
Completed in 1952
➔
Prototype of all subsequent general-purpose computers
20
Cont’d… Figure 1.6 IAS structure
AC: Accumulator register
MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
PC: program counter
MAR: memory address register
IR: instruction register
21
Cont’d…
Figure 1.7 IAS Memory Format
22
Cont’d…
Registers
• •Contains a word to be stored in memory or sent to the I/O unit
Memory buffer register (MBR) Contains a word to be stored in memory or sent to the I/O unit
• •Or is used to receive a word from memory or from the I/O unit
Or is used to receive a word from memory or from the I/O unit
••Specifies the address in memory of the word to be written
Memory address register (MAR) Specifies the address in memory of the word to be written
from
fromororread
readinto
intothe
theMBR
MBR
Instruction register (IR) • •Contains the 8-bit opcode instruction being executed
Contains the 8-bit opcode instruction being executed
• •Employed to temporarily hold the right-hand instruction from a
Instruction buffer register (IBR) Employed to temporarily hold the right-hand instruction from a
word
wordininmemory
memory
• •Contains the address of the next instruction pair to be fetched
Program counter (PC) Contains the address of the next instruction pair to be fetched
from
frommemory
memory
Accumulator (AC) and multiplier • •Employed to temporarily hold operands and results of ALU
Employed to temporarily hold operands and results of ALU
quotient (MQ) operations
operations
23
Cont’d…
M(X) = contents of memory
location whose address is X
(i:j) = bits i through j
Figure 1.8 The IAS Instruction Set 24
Cont’d…
Table 1.1 Partial Flowchart
of IAS Operation
25
Cont’d…
Second Generation: Transistors
Smaller
Cheaper
Dissipates less heat than a vacuum tube
Is a solid state device made from silicon
Was invented at Bell Labs in 1947
It was not until the late 1950’s that fully transistorized computers were
commercially available
26
Cont’d…
Table 1.2 Computer Generations
27
Cont’d…
Second Generation
Introduced:
More complex arithmetic and logic units and control units
The use of high-level programming languages
Provision of system software which provided the ability to:
➔
Load programs
➔
Move data to peripherals
➔
Libraries perform common computations
28
Cont’d…
Figure 1.9 An IBM 7094
Configuration
29
Cont’d…
Discrete component
➔
Single, self-contained transistor
➔
Manufactured separately, packaged in their own containers, and
soldered or wired together onto Masonite-like circuit boards
➔
Manufacturing process was expensive and cumbersome
30
Cont’d…
Third Generation: Integrated Circuits
1958 – the invention of the integrated circuit
Microelectronics
➔
Small electronics
The two most important members of the third generation were the IBM
System/360 and the DEC PDP-8
31
Cont’d…
(b) Memory cell
(a) Gate
Figure 1.10 Fundamental Computer Elements
32
Cont’d…
Integrated Circuits
Data storage – provided by memory cells
Data processing – provided by gates
Data movement – the paths among components are used to move data
from memory to memory and from memory through gates to memory
Control – the paths among components can carry control signals
33
Cont’d…
Integrated Circuits
A computer consists of gates, memory cells, and interconnections among these
elements
The gates and memory cells are constructed of simple digital electronic
components
Exploits the fact that such components as transistors, resistors, and conductors
can be fabricated from a semiconductor such as silicon
Many transistors can be produced at the same time on a single wafer of silicon
Transistors can be connected with a processor metallization to form circuits
34
Cont’d…
Packaged chip
Figure 1.11 Relationship among Wafer, Chip, and Gate
35
Cont’d…
Figure 1.12 Growth in Transistor Count on Integrated Circuits
36
Cont’d…
Moore’s Law
1965; Gordon Moore – co-founder of Intel
Observed number of transistors that could be put on a single chip was
doubling every year
Consequences of Moore’s law:
The pace slowed to a
doubling every 18 months
in the 1970’s but has The cost of Computer becomes
sustained that rate ever The electrical path
computer logic and smaller and is more Reduction in power
since length is shortened, Fewer interchip
memory circuitry convenient to use in and cooling
increasing operating connections
has fallen at a a variety of requirements
speed
dramatic rate environments
37
Cont’d…
IBM System/360
Announced in 1964
Product line was incompatible with older IBM machines
Was the success of the decade and cemented IBM as the overwhelmingly
dominant computer vendor
The architecture remains to this day the architecture of IBM’s mainframe
computers
Was the industry’s first planned family of computers
➔
Models were compatible in the sense that a program written for one model
should be capable of being executed by another model in the series
38
Cont’d…
Family Characteristics
Similar or identical instruction set
Similar or identical operating system
Increasing speed
Increasing number of I/O ports
Increasing memory size
Increasing cost
Figure 1.13 PDP-8 Bus Structure
39
Cont’d…
Later Generations
LSI Large Scale Integration
VLSI Very Large Scale Integration
ULSI Ultra Large Scale Integration
Two of the most important of developments in later generations
Semiconductor Memory
Microprocessors
40
Cont’d…
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor
memory
➔
Chip was about the size of a single core
➔
Could hold 256 bits of memory
➔
Non-destructive
➔
Much faster than core
41
Cont’d…
In 1974 the price per bit of semiconductor memory dropped below the price per bit of
core memory
➔
There has been a continuing and rapid decline in memory cost accompanied by a
corresponding increase in physical memory density
➔
Developments in memory and processor technologies changed the nature of
computers in less than a decade
Since 1970 semiconductor memory has been through 13 generations
➔
1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of this writing, 8
Gb on a single chip (1 k = 210, 1 M = 220, 1 G = 230).
➔
Each generation has provided four times the storage density of the previous
generation, accompanied by declining cost per bit and declining access time.
42
Cont’d…
Microprocessors
The density of elements on processor chips continued to rise
➔
More and more elements were placed on each chip so that fewer and
fewer chips were needed to construct a single computer processor
1971 Intel developed 4004
➔
First chip to contain all of the components of a CPU on a single chip
➔
Birth of microprocessor
43
Cont’d…
1972 Intel developed 8008
➔
First 8-bit microprocessor
1974 Intel developed 8080
➔
First general purpose microprocessor
➔
Faster, has a richer instruction set, has a large addressing capability
44
Cont’d…
Table 1.3 Evolution of Intel Microprocessors (a) 1970s Processors
(b) 1980s Processors
45
Cont’d…
Table 1.3 Evolution of Intel Microprocessors (c) 1990s Processors
(d) Recent Processors
46
Cont’d…
The Evolution of the Intel x86 Architecture
Two processor families are the Intel x86 and the ARM architectures
Current x86 offerings represent the results of decades of design effort on
complex instruction set computers (CISCs)
An alternative approach to processor design is the reduced instruction set
computer (RISC)
ARM architecture is used in a wide variety of embedded systems and is one
of the most powerful and best-designed RISC-based systems on the market
47
Cont’d…
Highlights of the Evolution of the Intel Product Line:
8080 8086 80286 80386 80486
• World’s first general- • A more powerful 16- • Extension of the 8086 • Intel’s first 32-bit • Introduced the use of
purpose bit machine enabling addressing a machine much more
microprocessor • Has an instruction 16-MB memory sophisticated and
instead of just 1MB • First Intel processor
cache, or queue, that powerful cache
• 8-bit machine, 8-bit to support technology and
prefetches a few multitasking
data path to memory instructions before sophisticated
• Was used in the first they are executed instruction
• The first appearance pipelining
personal computer
(Altair) of the x86 architecture • Also offered a built-
• The 8088 was a in math coprocessor
variant of this
processor and used in
IBM’s first personal
computer (securing
the success of Intel
48
Cont’d…
Highlights of the Evolution of the Intel Product Line:
Pentium
Intel introduced the use of superscalar techniques, which allow multiple
instructions to execute in parallel
Pentium II
An alternative approach to processor design is the reduced instruction set
computer (RISC)
Pentium III
Incorporated additional floating-point instructions
Streaming SIMD Extensions (SSE)
49
Cont’d…
Highlights of the Evolution of the Intel Product Line:
Pentium 4
Includes additional floating-point and other enhancements for multimedia
Core
First Intel x86 micro-core
Core 2
Extends the Core architecture to 64 bits
Core 2 Quad provides four cores on a single chip
More recent Core offerings have up to 10 cores per chip
An important addition to the architecture was the Advanced Vector Extensions
instruction set
50
Cont’d…
ARM
Refers to a processor architecture that has evolved from RISC design principles
and is used in embedded systems
Family of RISC-based microprocessors and microcontrollers designed by ARM
Holdings, Cambridge, England.
Chips are high-speed processors that are known for their small die size and low
power requirements.
Probably the most widely used embedded processor architecture and indeed the
most widely used processor architecture of any kind in the world.
Acorn RISC Machine/Advanced RISC Machine.
51
Cont’d…
ARM Products
Cortex-M
• Cortex-M0
Cortex-R • Cortex-M0+
• Cortex-M3
Cortex-A/Cortex- • Cortex-M4
A50
52
1.2 Performance Issues
Designing for Performance
➔ Microprocessor Speed
➔ Performance Balance
➔ Improvements in Chip Organization and Architecture
Multicore, MICs, and GPGPUs
Two Laws that Provide Insight: Amdahl’s Law and Little’s Law
Basic Measures of Computer Performance
➔ Clock Speed, & Instruction Execution Rate
Calculating the Mean
➔ Arithmetic, Harmonic, and Geometric Mean
Benchmarks and SPEC
➔ Benchmark Principles, and SPEC Benchmarks
53
1.2.1 Designing for Performance
The cost of computer systems continues to drop dramatically, while the performance and capacity of
those systems continue to rise equally dramatically.
Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago.
Processors are so inexpensive that we now have microprocessors we throw away.
Desktop applications that require the great power of today’s microprocessor-based systems include:
➔
Image processing ➔ Multimedia authoring
➔
Three-dimensional rendering ➔ Voice and video annotation of files
➔
Speech recognition ➔ Simulation modeling
➔
Video conferencing
Businesses are relying on increasingly powerful servers to handle transaction and database processing
and to support massive client/server networks that have replaced the huge mainframe computer centers
of yesteryear.
Cloud service providers use massive high-performance banks of servers to satisfy high-volume, high-
transaction-rate applications for a broad spectrum of clients.
54
Microprocessor Speed
Techniques built into contemporary processors include:
Pipelining: Processor moves data or instructions into a conceptual pipe with all stages of
the pipe processing simultaneously
Branch prediction: Processor looks ahead in the instruction code fetched from memory
and predicts which branches, or groups of instructions, are likely to be processed next
Superscalar execution: This is the ability to issue more than one instruction in every
processor clock cycle. (In effect, multiple parallel pipelines are used.)
Data flow analysis: Processor analyzes which instructions are dependent on each other’s
results, or data, to create an optimized schedule of instructions
Speculative execution: Using branch prediction and data flow analysis, some processors
speculatively execute instructions ahead of their actual appearance in the program
execution, holding the results in temporary locations, keeping execution engines as busy as
possible
55
Performance Balance
Adjust the organization and architecture to compensate for the mismatch among the
capabilities of the various components
Architectural examples include:
Increase the number of bits that are retrieved at one time by making DRAMs “wider” rather
than “deeper” and by using wide bus data paths
Change the DRAM interface to make it more efficient by including a cache or other
buffering scheme on the DRAM chip.
Reduce the frequency of memory access by incorporating increasingly complex and
efficient cache structures between the processor and main memory. This includes the
incorporation of one or more caches on the processor chip as well as on an off-chip cache
close to the processor chip.
Increase the interconnect bandwidth between processors and memory by using higher-speed
buses and a hierarchy of buses to buffer and structure data flow.
56
Cont’d…
Figure 1.14 Typical I/O Device Data Rates
57
Improvements in Chip Organization and Architecture
There are three approaches to achieving increased processors peed:
1. Increase hardware speed of processor
Fundamentally due to shrinking logic gate size
✔
More gates, packed more tightly, increasing clock rate
✔
Propagation time for signals reduced
2. Increase size and speed of caches
Dedicating part of processor chip
✔
Cache access times drop significantly
3. Change processor organization and architecture
Increase effective speed of instruction execution
✔
Parallelism
58
Cont’d…
Problems with Clock Speed and Login Density
Power
➔
Power density increases with density of logic and clock speed
➔
Dissipating heat
RC delay
➔
Speed at which electrons flow limited by resistance and capacitance of metal wires
connecting them
➔
Delay increases as the RC product increases
➔
As components on the chip decrease in size, the wire interconnects become thinner,
increasing resistance
➔
Also, the wires are closer together, increasing capacitance
Memory latency and throughput
➔
Memory access speed (latency) and transfer speed (throughput) lag processor speeds 59
Cont’d…
Figure 1.15 Processor Trends 60
1.2.2 Multicore, MICs, and GPGPUs
Multicore
The use of multiple processors on the same chip provides the potential to
increase performance without increasing the clock rate
Strategy is to use two simpler processors on the chip rather than one more
complex processor
With two processors larger caches are justified
As caches became larger it made performance sense to create two and then
three levels of cache on a chip
61
Cont’d…
Many Integrated Core (MIC))
Leap in performance as well as the challenges in developing software to exploit
such a large number of cores
The multicore and MIC strategy involves a homogeneous collection of general
purpose processors on a single chip
Graphics Processing Unit (GPU)
Core designed to perform parallel operations on graphics data
Traditionally found on a plug-in graphics card, it is used to encode and render
2D and 3D graphics as well as process video
Used as vector processors for a variety of applications that require repetitive
computations
62
Cont’d…
Since GPUs perform parallel operations on multiple sets of data, they are
increasingly being used as vector processors for a variety of applications
that require repetitive computations.
This blurs the line between the GPU and the CPU.
When a broad range of applications are supported by such a processor, the
term general-purpose computing on GPUs (GPGPU) is used.
63
1.2.3 Two Laws that Provide Insight: Amdahl’s Law and Little’s Law
Amdahl’s Law
Gene Amdahl
Deals with the potential speedup of a program using multiple processors
compared to a single processor
Illustrates the problems facing industry in the development of multi-core
machines
➔
Software must be adapted to a highly parallel execution environment to
exploit the power of parallel processing
Can be generalized to evaluate and design technical improvement in a
computer system
64
Cont’d…
Figure 1.16 Illustration of Amdahl’s Law 65
Cont’d…
Amdahl’s Law
Speedup = _Time to execute program on a single processor__
Time to execute program on N parallel processors
= T(1 - f ) + Tf = ____1____
T(1 - f ) + Tf (1 - f ) + _f_
N N
Two important conclusions can be drawn:
1. When f is small, the use of parallel processors has little effect.
2. As N approaches infinity, speedup is bound by 1/(1 - f ), so that there are
diminishing returns for using more processors.
66
Cont’d…
Figure 1.17 Amdahl’s Law for Multiprocessors 67
Cont’d…
Little’s Law
Fundamental and simple relation with broad applications
Can be applied to almost any system that is statistically in steady state, and in which there is
no leakage.
Queuing system
➔
If server is idle an item is served immediately, otherwise an arriving item joins a queue
➔
There can be a single queue for a single server or for multiple servers, or multiple queues
with one being for each of multiple servers
Average number of items in a queuing system equals the average rate at which items arrive
multiplied by the time that an item spends in the system
➔
Relationship requires very few assumptions
➔
Because of its simplicity and generality it is extremely useful
68
1.2.4 Basic Measures of Computer Performance
Clock Speed
Figure 1.18 System Clock 69
Cont’d…
Table 1.4 Performance Factors and System Attributes
70
1.2.5 Calculating the Mean
The three common
The use of benchmarks to compare systems formulas used for
involves calculating the mean value of a set of
data points related to execution time calculating a mean
are:
• Arithmetic
• Geometric
• Harmonic
71
Cont’d…
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7,
8, 8, 9, 11)
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 11, 11)
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 11)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11)
MD = median
AM = arithmetic mean
GM = geometric mean
HM = harmonic mean
Figure 1.19 Comparison of Means on Various Data Sets (each set has a maximum data point value of 11)
72
Cont’d…
An Arithmetic Mean (AM) is an appropriate measure if the sum of all the measurements
is a meaningful and interesting value
The AM is a good candidate for comparing the execution time performance of several
systems
For example, suppose we were interested in using a system for large-scale simulation studies and
wanted to evaluate several alternative products. On each system we could run the simulation
multiple times with different input values for each run, and then take the average execution time
across all runs. The use of multiple runs with different inputs should ensure that the results are not
heavily biased by some unusual feature of a given input set. The AM of all the runs is a good
measure of the system’s performance on simulations, and a good number to use for system
comparison.
The AM used for a time-based variable, such as program execution time, has the
important property that it is directly proportional to the total time
➔ If the total time doubles, the mean value doubles
73
Cont’d…
Table 1.5 A Comparison of Arithmetic and Harmonic Means for Rates
74
Cont’d…
Table 1.6 A Comparison of Arithmetic and Geometric Means for Normalized Results
(a) Results normalized to Computer A
(b) Results normalized to Computer B
75
Cont’d…
Table 1.7 Another Comparison of Arithmetic and Geometric Means for Normalized Results
(a) Results normalized to Computer A
(b) Results normalized to Computer B
76
1.2.5 Benchmarks and SPEC
Benchmark Principles
Desirable characteristics of a benchmark program:
1. It is written in a high-level language, making it portable across different
machines
2. It is representative of a particular kind of programming domain or paradigm,
such as systems programming, numerical programming, or commercial
programming
3. It can be measured easily
4. It has wide distribution
77
System Performance Evaluation Corporation (SPEC)
Benchmark suite
➔ A collection of programs, defined in a high-level language
➔ Together attempt to provide a representative test of a computer in a
particular application or system programming area
SPEC
➔ An industry consortium
➔ Defines and maintains the best known collection of benchmark suites
aimed at evaluating computer systems
➔ Performance measurements are widely used for comparison and
research purposes
78
Cont’d…
SPEC CPU2006
Best known SPEC benchmark suite
Industry standard suite for processor intensive applications
Appropriate for measuring performance for applications that spend most of their
time doing computation rather than I/O
Consists of 17 floating point programs written in C, C++, and Fortran and 12
integer programs written in C and C++
Suite contains over 3 million lines of code
Fifth generation of processor intensive suites from SPEC
79
Cont’d…
Table 1.8 SPEC CPU2006
Integer Benchmarks
80
Cont’d…
Table 1.9 SPEC CPU2006
Floating-Point Benchmarks
81
Cont’d…
Terms Used in SPEC Documentation
Benchmark Peak metric
➔
A program written in a high-level language that
This enables users to attempt to optimize system
can be compiled and executed on any computer performance by optimizing the compiler output
that implements the compiler Speed metric
System under test
This is simply a measurement of the time it takes to
➔
This is the system to be evaluated execute a compiled benchmark
Used for comparing the ability of a computer to
Reference machine
complete single tasks
➔
This is a system used by SPEC to establish a Rate metric
baseline performance for all benchmarks
This is a measurement of how many tasks a computer
➔
Each benchmark is run and measured on this can accomplish in a certain amount of time
machine to establish a reference time for that
benchmark
This is called a throughput, capacity, or rate measure
Base metric
Allows the system under test to execute simultaneous
tasks to take advantage of multiple processors
➔
These are required for all reported results and
have strict guidelines for compilation
82
Cont’d…
Figure 1.20 SPEC Evaluation Flowchart
83
Cont’d…
Table 1.10 Some SPEC CINT2006 Results
(a) Sun Blade 1000
Cont’d…
1.3 Computer system
A Top-Level View of Computer Function and Interconnection
Computer Components
Computer Function
➔
Instruction Fetch and Execute
➔
Interrupts
➔
I/O Function
Interconnection Structures
Bus Interconnection
86
Cont’d…
At a top level, a computer consists of CPU (central processing unit),
memory, and I/O components, with one or more modules of each type.
These components are interconnected in some fashion to achieve the basic
function of the computer, which is to execute programs.
Thus, at a top level, we can characterize a computer system by describing
1.The external behavior of each component, that is, the data and control
signals that it exchanges with other components, and
2.The interconnection structure and the controls required to manage the
use of the interconnection structure.
87
1.3.1 Computer Components
Contemporary computer designs are based on concepts developed by John von
Neumann at the Institute for Advanced Studies, Princeton
Referred to as the von Neumann architecture and is based on three key concepts:
➔
Data and instructions are stored in a single read-write memory
➔
The contents of this memory are addressable by location, without regard to the
type of data contained there
➔
Execution occurs in a sequential fashion (unless explicitly modified) from one
instruction to the next
Hardwired program
➔
The result of the process of connecting the various components in the desired
configuration
88
Cont’d…
(a) Programming in hardware
(b) Programming in software
Figure 1. Hardware and Software Approaches
89
Cont’d…
Software
A sequence of codes or instructions
Part of the hardware interprets each instruction and generates control
signals
Provide a new sequence of codes for each new program instead of rewiring
the hardware
90
Cont’d…
Major components:
CPU
➔
Instruction interpreter
➔
Module of general-purpose arithmetic and logic functions
I/O Components
➔
Input module
✔
Contains basic components for accepting data and instructions and
converting them into an internal form of signals usable by the system
➔
Output module
✔
Means of reporting result
91
Memory address register Memory buffer register
(MAR) (MBR) MEMORY
• Specifies the address in • Contains the data to be written
memory for the next read or into memory or receives the
write data read from memory
MAR
I/O address register I/O buffer register
(I/OAR) (I/OBR)
• Specifies a particular I/O device • Used for the exchange of data
between an I/O module and the
CPU
MBR
92
Cont’d…
Figure 1.21 Hardware and Software Approaches 93
1.3.2 Computer Function
The basic function performed by a computer is execution of a program,
which consists of a set of instructions stored in memory.
The processor does the actual work by executing instructions specified in
the program.
Instruction processing consists of two steps:
➔ The processor reads (fetches) instructions from memory one at a time and
executes each instruction.
Figure 1.22 Basic Instruction Cycle 94
Cont’d…
Instruction Fetch and Execute
At the beginning of each instruction cycle the processor fetches an
instruction from memory
The program counter (PC) holds the address of the instruction to be fetched
next
The processor increments the PC after each instruction fetch so that it will
fetch the next instruction in sequence
The fetched instruction is loaded into the instruction register (IR)
The processor interprets the instruction and performs the required action
95
Cont’d…
These actions fall into four categories:
Processor-memory: Data transferred from processor to memory or from
memory to processor.
Processor-I/O: Data transferred to or from a peripheral device by
transferring between the processor and an I/O module.
Data processing: The processor may perform some arithmetic or logic
operation on data.
Control: An instruction may specify that the sequence of execution be
altered.
96
Cont’d…
(a) Instruction format
(b) Integer format
Program counter (PC) = Address of instruction 0001 = Load AC from memory
Instruction register (IR) = Instruction being executed 0010 = Store AC to memory
Accumulator (AC) = Temporary storage 0101 = Add to AC from memory
(c) Internal CPU registers (d) Partial list of opcodes
Figure 1.23 Characteristics of a Hypothetical Machine 97
Cont’d…
Figure 1.24 Example of Program Execution (contents of memory and registers in hexadecimal) 98
Cont’d…
Figure 1.25 Instruction Cycle State Diagram
99
Interrupts
Table 1.11 Classes of Interrupts
100
Cont’d…
(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait
= interrupt occurs during course of execution of user program
Figure 1.25 Program Flow of Control without and with Interrupts 101
Cont’d…
Figure 1.26 Transfer of Control via Interrupts 102
Cont’d…
Figure 1.27 Instruction Cycle with Interrupts
103
Cont’d…
Figure 1.28 Program Timing: Short I/O Wait
104
Cont’d…
Figure 1.29 Program Timing: Short I/O Wait
105
Cont’d…
Figure 1.30 Instruction Cycle State Diagram, with Interrupts
106
Cont’d…
Figure 1.31 Transfer of Control with Multiple Interrupts 107
Cont’d…
Figure 1.32 Example Time Sequence of Multiple Interrupts 108
I/O Function
I/O module can exchange data directly with the processor
Processor can read data from or write data to an I/O module
➔
Processor identifies a specific device that is controlled by a particular I/O
module
➔
I/O instructions rather than memory referencing instructions
In some cases it is desirable to allow I/O exchanges to occur directly with
memory
➔
The processor grants to an I/O module the authority to read from or write to
memory so that the I/O memory transfer can occur without tying up the
processor
➔
The I/O module issues read or write commands to memory relieving the
processor of responsibility for the exchange
➔
This operation is known as direct memory access (DMA)
109
1.3.3 Interconnection Structures
Figure 1.33 Computer Modules 110
Cont’d…
The interconnection structure must support the following types of transfers:
Memory to Processor to I/O to Processor to I/O to or from
processor memory processor I/O memory
An I/O
module is
allowed to
Processor exchange
Processor Processor Processor data directly
reads an reads data
writes a unit sends data to with memory
instruction or a from an I/O
unit of data of data to the I/O without going
device via an through the
from memory memory device
I/O module processor
using direct
memory
access
111
1.3.4 Bus Interconnection
A communication pathway connecting two or more devices
➔
Key characteristic is that it is a shared transmission medium
Signals transmitted by any one device are available for reception by all other devices
attached to the bus
➔
If two devices transmit during the same time period their signals will overlap and
become garbled
Typically consists of multiple communication lines
➔
Each line is capable of transmitting signals representing binary 1 and binary 0
Computer systems contain a number of different buses that provide pathways between
components at various levels of the computer system hierarchy
System bus
➔
A bus that connects major computer components (processor, memory, I/O)
The most common computer interconnection structures are based on the use of one or
more system buses 112
Cont’d…
Data Bus
Data lines that provide a path for moving data among system modules
May consist of 32, 64, 128, or more separate lines
The number of lines is referred to as the width of the data bus
The number of lines determines how many bits can be transferred at a time
The width of the data bus is a key factor in determining overall system
performance
113
Address Bus Control Bus
Used to designate the source or Used to control the access and the use of
destination of the data on the data bus the data and address lines
➔
If the processor wishes to read a word Because the data and address lines are
of data from memory it puts the shared by all components there must be a
address of the desired word on the means of controlling their use
address lines Control signals transmit both command
Width determines the maximum possible and timing information among system
memory capacity of the system modules
Also used to address I/O ports Timing signals indicate the validity of
➔
The higher order bits are used to select data and address information
a particular module on the bus and the Command signals specify operations to
lower order bits select a memory be performed
location or I/O port within the module
Cont’d…
Figure 1.32 Bus Interconnection Scheme 115
Thank You !