0% found this document useful (0 votes)
88 views116 pages

Chapter 1 Computer Architecture and Organization

The document provides an overview of computer architecture and organization, detailing the basic concepts, performance, and structure of computer systems. It discusses the evolution of computers from vacuum tubes to microprocessors, highlighting key architectures such as IBM System/370 and System/360. Additionally, it explains the hierarchical nature of computer systems, their main components, and the fundamental functions they perform.

Uploaded by

rohobotkolaso787
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views116 pages

Chapter 1 Computer Architecture and Organization

The document provides an overview of computer architecture and organization, detailing the basic concepts, performance, and structure of computer systems. It discusses the evolution of computers from vacuum tubes to microprocessors, highlighting key architectures such as IBM System/370 and System/360. Additionally, it explains the hierarchical nature of computer systems, their main components, and the fundamental functions they perform.

Uploaded by

rohobotkolaso787
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

Computer Architecture and Organization

[ECEg - 4163]

Chapter One:
Overview of Computer Architecture and
Organization

Prepared by Amanuel Z. & Satenaw S.


Outline
 Basic Concepts and Computer Evolution
 Performance
 Computer system
1.1 Basic Concepts and Computer Evolution
1.1.1 Organization and Architecture
Computer architecture

Refers to those attributes of a system visible to a programmer or,

Those attributes that have a direct impact on the logical execution of a
program.

It defines:

Instruction sets

Data representation

Techniques for addressing memory

I/O mechanisms
3
Cont’d...

Computer organization refers to the operational units and their
interconnections that realize the architectural specifications.

Control signals;

Interfaces between the computer and peripherals; and

The memory technology used.

4
Cont’d...
IBM System/370 Architecture

Was introduced in 1970

Included a number of models

Could upgrade to a more expensive, faster model without having to
abandon original software

New models are introduced with improved technology, but retain the same
architecture so that the customer’s software investment is protected

Architecture has survived to this day as the architecture of IBM’s
mainframe product line

5
1.1.2 Structure and Function

A computer is a complex system; contemporary computers contain
millions of elementary electronic components.

How can one clearly describe them?

The key to clearly describe them is to recognize the hierarchical nature of
most complex systems, including the computer [SIMO96].

A hierarchical system is a set of interrelated subsystems, each of the latter,
in turn, hierarchical in structure until we reach some lowest level of
elementary subsystem.

The hierarchical nature of complex systems is essential to both their design
and their description.
6
Cont’d…

The designer need only deal with a particular level of the system at a time.
At each level, the system consists of a set of components and their
interrelationships.

The behavior at each level depends only on a simplified, abstracted
characterization of the system at the next lower level.

At each level, the designer is concerned with structure and function:

Structure: The way in which the components are interrelated.

Function: The operation of each individual component as part of the
structure.

7
Cont’d…

In terms of description, we have two choices:

Starting at the bottom and building up to a complete description, or

Beginning with a top view and decomposing the system into its sub
parts.

Evidence from a number of fields suggests that the t­op down approach is
the clearest and most effective.

8
Cont’d...
Function
There are four basic functions that a computer can perform:

Data processing: Data may take a wide variety of forms and the range of processing
requirements is broad

Data storage: Short-term/Long-term

Data movement

Input-output (I/O) - when data are received from or delivered to a device (peripheral)
that is directly connected to the computer

Data communications – when data are moved over longer distances, to or from a
remote device

Control

A control unit manages the computer’s resources and orchestrates the performance of
its functional parts in response to instructions
9
Cont’d…
Operating environment (source and destination of data)

Figure 1.1 depicts the basic


functions that a computer can
perform.

Figure 1.1 A Functional View of the Computer 10


Cont’d…
Figure 1.2 depicts the four possible types of operations.

Figure 1.2 Possible Computer Operations 11


Cont’d…

Figure 1.2 Possible Computer Operations 12


Cont’d…
Structure:

Figure 1.3 is the simplest possible depiction of a computer.

The computer interacts in some fashion with its external environment.
 All of its linkages to the external
environment can be classified as
peripheral devices or communication
lines.

Figure 1.3 The Computer


13
Cont’d…
 The greater concern in this
course is the internal structure
of the computer itself, which
is shown in Figure 1.4.

Figure 1.4 The Computer: Top-


Level Structure
14
Cont’d…
There are four main structural components:

Central processing unit (CPU): Controls the operation of the computer and
performs its data processing functions; often simply referred to as processor.

Main memory: Stores data.

I/O: Moves data between the computer and its external environment.

System interconnection: Some mechanism that provides for communication
among CPU, main memory, and I/O.

A common example of system interconnection is by means of a system bus,
consisting of a number of conducting wires to which all the other
components attach.
15
Cont’d…
CPU
Its major structural components are as follows:

Control unit: Controls the operation of the CPU and hence the computer.

Arithmetic and logic unit (ALU): Performs the computer’s data
processing functions.

Registers: Provides storage internal to the CPU.

CPU interconnection: Some mechanism that provides for communication
among the control unit, ALU, and registers.

16
Cont’d…
Multicore Computer
Structure

Figure 1.5 Simplified View of Major Elements of a Multicore Computer 17


Cont’d…
 Central processing unit (CPU)

Portion of the computer that fetches and executes instructions

Consists of an ALU, a control unit, and registers

Referred to as a processor in a system with a single processing unit
 Core

An individual processing unit on a processor chip

May be equivalent in functionality to a CPU on a single-CPU system

Specialized processing units are also referred to as cores
 Processor

A physical piece of silicon containing one or more cores

Is the computer component that interprets and executes instructions

Referred to as a multicore processor if it contains multiple cores
18
Cont’d…
Cache Memory
 Multiple layers of memory between the processor and main memory
 Is smaller and faster than main memory
 Used to speed up memory access by placing in the cache data from main
memory that is likely to be used in the near future
 A greater performance improvement may be obtained by using multiple
levels of cache, with level 1 (L1) closest to the core and additional levels
(L2, L3, etc.) progressively farther from the core

19
1.1.3 A Brief History of Computers
The First Generation:Vacuum Tubes

Vacuum tubes were used for digital logic elements and memory

IAS computer

Fundamental design approach was the stored program concept

Attributed to the mathematician John von Neumann

First publication of the idea was in 1945 for the EDVAC

In 1946 design began at the Princeton Institute for Advanced Studies

Completed in 1952

Prototype of all subsequent general-purpose computers
20
Cont’d… Figure 1.6 IAS structure

AC: Accumulator register


MQ: multiply-quotient register
MBR: memory buffer register
IBR: instruction buffer register
PC: program counter
MAR: memory address register
IR: instruction register
21
Cont’d…

Figure 1.7 IAS Memory Format


22
Cont’d…
Registers
• •Contains a word to be stored in memory or sent to the I/O unit
Memory buffer register (MBR) Contains a word to be stored in memory or sent to the I/O unit
• •Or is used to receive a word from memory or from the I/O unit
Or is used to receive a word from memory or from the I/O unit

••Specifies the address in memory of the word to be written


Memory address register (MAR) Specifies the address in memory of the word to be written
from
fromororread
readinto
intothe
theMBR
MBR

Instruction register (IR) • •Contains the 8-bit opcode instruction being executed
Contains the 8-bit opcode instruction being executed

• •Employed to temporarily hold the right-hand instruction from a


Instruction buffer register (IBR) Employed to temporarily hold the right-hand instruction from a
word
wordininmemory
memory

• •Contains the address of the next instruction pair to be fetched


Program counter (PC) Contains the address of the next instruction pair to be fetched
from
frommemory
memory

Accumulator (AC) and multiplier • •Employed to temporarily hold operands and results of ALU
Employed to temporarily hold operands and results of ALU
quotient (MQ) operations
operations
23
Cont’d…

M(X) = contents of memory


location whose address is X
(i:j) = bits i through j

Figure 1.8 The IAS Instruction Set 24


Cont’d…

Table 1.1 Partial Flowchart


of IAS Operation

25
Cont’d…
Second Generation: Transistors

Smaller

Cheaper

Dissipates less heat than a vacuum tube

Is a solid state device made from silicon

Was invented at Bell Labs in 1947

It was not until the late 1950’s that fully transistorized computers were
commercially available

26
Cont’d…
Table 1.2 Computer Generations

27
Cont’d…
Second Generation
Introduced:

More complex arithmetic and logic units and control units

The use of high-level programming languages

Provision of system software which provided the ability to:

Load programs

Move data to peripherals

Libraries perform common computations
28
Cont’d…

Figure 1.9 An IBM 7094


Configuration
29
Cont’d…
 Discrete component

Single, self-contained transistor

Manufactured separately, packaged in their own containers, and
soldered or wired together onto Masonite-like circuit boards

Manufacturing process was expensive and cumbersome

30
Cont’d…
Third Generation: Integrated Circuits

1958 – the invention of the integrated circuit

Microelectronics

Small electronics

The two most important members of the third generation were the IBM
System/360 and the DEC PDP-8

31
Cont’d…

(b) Memory cell


(a) Gate

Figure 1.10 Fundamental Computer Elements


32
Cont’d…
Integrated Circuits

Data storage – provided by memory cells

Data processing – provided by gates

Data movement – the paths among components are used to move data
from memory to memory and from memory through gates to memory

Control – the paths among components can carry control signals

33
Cont’d…
Integrated Circuits

A computer consists of gates, memory cells, and interconnections among these
elements

The gates and memory cells are constructed of simple digital electronic
components

Exploits the fact that such components as transistors, resistors, and conductors
can be fabricated from a semiconductor such as silicon

Many transistors can be produced at the same time on a single wafer of silicon

Transistors can be connected with a processor metallization to form circuits
34
Cont’d…

Packaged chip

Figure 1.11 Relationship among Wafer, Chip, and Gate


35
Cont’d…

Figure 1.12 Growth in Transistor Count on Integrated Circuits


36
Cont’d…
Moore’s Law

1965; Gordon Moore – co-founder of Intel

Observed number of transistors that could be put on a single chip was


doubling every year

Consequences of Moore’s law:


The pace slowed to a
doubling every 18 months
in the 1970’s but has The cost of Computer becomes
sustained that rate ever The electrical path
computer logic and smaller and is more Reduction in power
since length is shortened, Fewer interchip
memory circuitry convenient to use in and cooling
increasing operating connections
has fallen at a a variety of requirements
speed
dramatic rate environments

37
Cont’d…
IBM System/360

Announced in 1964

Product line was incompatible with older IBM machines

Was the success of the decade and cemented IBM as the overwhelmingly
dominant computer vendor

The architecture remains to this day the architecture of IBM’s mainframe
computers

Was the industry’s first planned family of computers

Models were compatible in the sense that a program written for one model
should be capable of being executed by another model in the series
38
Cont’d…
Family Characteristics

Similar or identical instruction set

Similar or identical operating system

Increasing speed

Increasing number of I/O ports

Increasing memory size

Increasing cost

Figure 1.13 PDP-​­8 Bus Structure


39
Cont’d…

Later Generations

LSI Large Scale Integration

VLSI Very Large Scale Integration

ULSI Ultra Large Scale Integration

Two of the most important of developments in later generations



Semiconductor Memory

Microprocessors
40
Cont’d…

Semiconductor Memory

In 1970 Fairchild produced the first relatively capacious semiconductor
memory

Chip was about the size of a single core

Could hold 256 bits of memory

Non-destructive

Much faster than core

41
Cont’d…
 In 1974 the price per bit of semiconductor memory dropped below the price per bit of
core memory

There has been a continuing and rapid decline in memory cost accompanied by a
corresponding increase in physical memory density

Developments in memory and processor technologies changed the nature of
computers in less than a decade


Since 1970 semiconductor memory has been through 13 generations

1k, 4k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and, as of this writing, 8
Gb on a single chip (1 k = 210, 1 M = 220, 1 G = 230).

Each generation has provided four times the storage density of the previous
generation, accompanied by declining cost per bit and declining access time.
42
Cont’d…

Microprocessors

The density of elements on processor chips continued to rise

More and more elements were placed on each chip so that fewer and
fewer chips were needed to construct a single computer processor

1971 Intel developed 4004

First chip to contain all of the components of a CPU on a single chip

Birth of microprocessor

43
Cont’d…
 1972 Intel developed 8008

First 8-bit microprocessor

1974 Intel developed 8080

First general purpose microprocessor

Faster, has a richer instruction set, has a large addressing capability

44
Cont’d…
Table 1.3 Evolution of Intel Microprocessors (a) 1970s Processors

(b) 1980s Processors

45
Cont’d…
Table 1.3 Evolution of Intel Microprocessors (c) 1990s Processors

(d) Recent Processors

46
Cont’d…
The Evolution of the Intel x86 Architecture

Two processor families are the Intel x86 and the ARM architectures

Current x86 offerings represent the results of decades of design effort on
complex instruction set computers (CISCs)

An alternative approach to processor design is the reduced instruction set
computer (RISC)

ARM architecture is used in a wide variety of embedded systems and is one
of the most powerful and best-designed RISC-based systems on the market

47
Cont’d…
Highlights of the Evolution of the Intel Product Line:
8080 8086 80286 80386 80486
• World’s first general- • A more powerful 16- • Extension of the 8086 • Intel’s first 32-bit • Introduced the use of
purpose bit machine enabling addressing a machine much more
microprocessor • Has an instruction 16-MB memory sophisticated and
instead of just 1MB • First Intel processor
cache, or queue, that powerful cache
• 8-bit machine, 8-bit to support technology and
prefetches a few multitasking
data path to memory instructions before sophisticated
• Was used in the first they are executed instruction
• The first appearance pipelining
personal computer
(Altair) of the x86 architecture • Also offered a built-
• The 8088 was a in math coprocessor
variant of this
processor and used in
IBM’s first personal
computer (securing
the success of Intel

48
Cont’d…
Highlights of the Evolution of the Intel Product Line:
Pentium

Intel introduced the use of superscalar techniques, which allow multiple
instructions to execute in parallel
Pentium II

An alternative approach to processor design is the reduced instruction set
computer (RISC)
Pentium III

Incorporated additional floating-point instructions

Streaming SIMD Extensions (SSE)
49
Cont’d…
Highlights of the Evolution of the Intel Product Line:
Pentium 4

Includes additional floating-point and other enhancements for multimedia
Core

First Intel x86 micro-core
Core 2

Extends the Core architecture to 64 bits

Core 2 Quad provides four cores on a single chip

More recent Core offerings have up to 10 cores per chip

An important addition to the architecture was the Advanced Vector Extensions
instruction set
50
Cont’d…
ARM
 Refers to a processor architecture that has evolved from RISC design principles
and is used in embedded systems
 Family of RISC-based microprocessors and microcontrollers designed by ARM
Holdings, Cambridge, England.
 Chips are high-speed processors that are known for their small die size and low
power requirements.
 Probably the most widely used embedded processor architecture and indeed the
most widely used processor architecture of any kind in the world.
 Acorn RISC Machine/Advanced RISC Machine.
51
Cont’d…

ARM Products

Cortex-M
• Cortex-M0
Cortex-R • Cortex-M0+
• Cortex-M3
Cortex-A/Cortex- • Cortex-M4
A50

52
1.2 Performance Issues
 Designing for Performance
➔ Microprocessor Speed

➔ Performance Balance

➔ Improvements in Chip Organization and Architecture

 Multicore, MICs, and GPGPUs


 Two Laws that Provide Insight: Amdahl’s Law and Little’s Law
 Basic Measures of Computer Performance
➔ Clock Speed, & Instruction Execution Rate

 Calculating the Mean


➔ Arithmetic, Harmonic, and Geometric Mean

 Benchmarks and SPEC


➔ Benchmark Principles, and SPEC Benchmarks

53
1.2.1 Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance and capacity of
those systems continue to rise equally dramatically.

Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago.

Processors are so inexpensive that we now have microprocessors we throw away.

Desktop applications that require the great power of today’s microprocessor-based systems include:

Image processing ➔ Multimedia authoring

Three-dimensional rendering ➔ Voice and video annotation of files

Speech recognition ➔ Simulation modeling

Video conferencing

Businesses are relying on increasingly powerful servers to handle transaction and database processing
and to support massive client/server networks that have replaced the huge mainframe computer centers
of yesteryear.

Cloud service providers use massive high-performance banks of servers to satisfy high-volume, high-
transaction-rate applications for a broad spectrum of clients.
54
Microprocessor Speed
Techniques built into contemporary processors include:

Pipelining: Processor moves data or instructions into a conceptual pipe with all stages of
the pipe processing simultaneously

Branch prediction: Processor looks ahead in the instruction code fetched from memory
and predicts which branches, or groups of instructions, are likely to be processed next

Superscalar execution: This is the ability to issue more than one instruction in every
processor clock cycle. (In effect, multiple parallel pipelines are used.)

Data flow analysis: Processor analyzes which instructions are dependent on each other’s
results, or data, to create an optimized schedule of instructions

Speculative execution: Using branch prediction and data flow analysis, some processors
speculatively execute instructions ahead of their actual appearance in the program
execution, holding the results in temporary locations, keeping execution engines as busy as
possible
55
Performance Balance

Adjust the organization and architecture to compensate for the mismatch among the
capabilities of the various components
Architectural examples include:

Increase the number of bits that are retrieved at one time by making DRAMs “wider” rather
than “deeper” and by using wide bus data paths

Change the DRAM interface to make it more efficient by including a cache or other
buffering scheme on the DRAM chip.

Reduce the frequency of memory access by incorporating increasingly complex and
efficient cache structures between the processor and main memory. This includes the
incorporation of one or more caches on the processor chip as well as on an off-chip cache
close to the processor chip.

Increase the interconnect bandwidth between processors and memory by using higher-speed
buses and a hierarchy of buses to buffer and structure data flow.
56
Cont’d…

Figure 1.14 Typical I/O Device Data Rates


57
Improvements in Chip Organization and Architecture
There are three approaches to achieving increased processors peed:
1. Increase hardware speed of processor

Fundamentally due to shrinking logic gate size

More gates, packed more tightly, increasing clock rate

Propagation time for signals reduced
2. Increase size and speed of caches

Dedicating part of processor chip

Cache access times drop significantly
3. Change processor organization and architecture

Increase effective speed of instruction execution

Parallelism
58
Cont’d…
Problems with Clock Speed and Login Density
 Power

Power density increases with density of logic and clock speed

Dissipating heat

RC delay

Speed at which electrons flow limited by resistance and capacitance of metal wires
connecting them

Delay increases as the RC product increases

As components on the chip decrease in size, the wire interconnects become thinner,
increasing resistance

Also, the wires are closer together, increasing capacitance

Memory latency and throughput

Memory access speed (latency) and transfer speed (throughput) lag processor speeds 59
Cont’d…

Figure 1.15 Processor Trends 60


1.2.2 Multicore, MICs, and GPGPUs
Multicore

The use of multiple processors on the same chip provides the potential to
increase performance without increasing the clock rate

Strategy is to use two simpler processors on the chip rather than one more
complex processor

With two processors larger caches are justified

As caches became larger it made performance sense to create two and then
three levels of cache on a chip

61
Cont’d…
Many Integrated Core (MIC))

Leap in performance as well as the challenges in developing software to exploit
such a large number of cores

The multicore and MIC strategy involves a homogeneous collection of general
purpose processors on a single chip

Graphics Processing Unit (GPU)



Core designed to perform parallel operations on graphics data

Traditionally found on a plug-in graphics card, it is used to encode and render
2D and 3D graphics as well as process video

Used as vector processors for a variety of applications that require repetitive
computations
62
Cont’d…
 Since GPUs perform parallel operations on multiple sets of data, they are
increasingly being used as vector processors for a variety of applications
that require repetitive computations.
 This blurs the line between the GPU and the CPU.
 When a broad range of applications are supported by such a processor, the
term general-purpose computing on GPUs (GPGPU) is used.

63
1.2.3 Two Laws that Provide Insight: Amdahl’s Law and Little’s Law
Amdahl’s Law
 Gene Amdahl

Deals with the potential speedup of a program using multiple processors
compared to a single processor

Illustrates the problems facing industry in the development of multi-core
machines

Software must be adapted to a highly parallel execution environment to
exploit the power of parallel processing


Can be generalized to evaluate and design technical improvement in a
computer system
64
Cont’d…

Figure 1.16 Illustration of Amdahl’s Law 65


Cont’d…
Amdahl’s Law

Speedup = _Time to execute program on a single processor__


Time to execute program on N parallel processors
= T(1 - f ) + Tf = ____1____
T(1 - f ) + Tf (1 - f ) + _f_
N N
Two important conclusions can be drawn:
1. When f is small, the use of parallel processors has little effect.
2. As N approaches infinity, speedup is bound by 1/(1 - f ), so that there are
diminishing returns for using more processors.
66
Cont’d…

Figure 1.17 Amdahl’s Law for Multiprocessors 67


Cont’d…
Little’s Law

Fundamental and simple relation with broad applications

Can be applied to almost any system that is statistically in steady state, and in which there is
no leakage.

Queuing system

If server is idle an item is served immediately, otherwise an arriving item joins a queue

There can be a single queue for a single server or for multiple servers, or multiple queues
with one being for each of multiple servers

Average number of items in a queuing system equals the average rate at which items arrive
multiplied by the time that an item spends in the system

Relationship requires very few assumptions

Because of its simplicity and generality it is extremely useful
68
1.2.4 Basic Measures of Computer Performance
Clock Speed

Figure 1.18 System Clock 69


Cont’d…
Table 1.4 Performance Factors and System Attributes

70
1.2.5 Calculating the Mean

The three common


The use of benchmarks to compare systems formulas used for
involves calculating the mean value of a set of
data points related to execution time calculating a mean
are:

• Arithmetic
• Geometric
• Harmonic
71
Cont’d…
(a) Constant (11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11)
(b) Clustered around a central value (3, 5, 6, 6, 7, 7, 7,
8, 8, 9, 11)
(c) Uniform distribution (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
(d) Large-number bias (1, 4, 4, 7, 7, 9, 9, 10, 10, 11, 11)
(e) Small-number bias(1, 1, 2, 2, 3, 3, 5, 5, 8, 8, 11)
(f) Upper outlier (11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
(g) Lower outlier (1, 11, 11, 11, 11, 11, 11, 11, 11, 11,
11)

MD = median
AM = arithmetic mean
GM = geometric mean
HM = harmonic mean

Figure 1.19 Comparison of Means on Various Data Sets (each set has a maximum data point value of 11)
72
Cont’d…

An Arithmetic Mean (AM) is an appropriate measure if the sum of all the measurements
is a meaningful and interesting value

The AM is a good candidate for comparing the execution time performance of several
systems
For example, suppose we were interested in using a system for large-scale simulation studies and
wanted to evaluate several alternative products. On each system we could run the simulation
multiple times with different input values for each run, and then take the average execution time
across all runs. The use of multiple runs with different inputs should ensure that the results are not
heavily biased by some unusual feature of a given input set. The AM of all the runs is a good
measure of the system’s performance on simulations, and a good number to use for system
comparison.
 The AM used for a time-based variable, such as program execution time, has the
important property that it is directly proportional to the total time
➔ If the total time doubles, the mean value doubles
73
Cont’d…
Table 1.5 A Comparison of Arithmetic and Harmonic Means for Rates

74
Cont’d…
Table 1.6 A Comparison of Arithmetic and Geometric Means for Normalized Results
(a) Results normalized to Computer A

(b) Results normalized to Computer B

75
Cont’d…
Table 1.7 Another Comparison of Arithmetic and Geometric Means for Normalized Results
(a) Results normalized to Computer A

(b) Results normalized to Computer B

76
1.2.5 Benchmarks and SPEC
Benchmark Principles
 Desirable characteristics of a benchmark program:
1. It is written in a high-level language, making it portable across different
machines
2. It is representative of a particular kind of programming domain or paradigm,
such as systems programming, numerical programming, or commercial
programming
3. It can be measured easily
4. It has wide distribution

77
System Performance Evaluation Corporation (SPEC)
 Benchmark suite
➔ A collection of programs, defined in a high-level language
➔ Together attempt to provide a representative test of a computer in a
particular application or system programming area

 SPEC
➔ An industry consortium
➔ Defines and maintains the best known collection of benchmark suites
aimed at evaluating computer systems
➔ Performance measurements are widely used for comparison and
research purposes
78
Cont’d…
SPEC CPU2006
 Best known SPEC benchmark suite
 Industry standard suite for processor intensive applications
 Appropriate for measuring performance for applications that spend most of their
time doing computation rather than I/O
 Consists of 17 floating point programs written in C, C++, and Fortran and 12
integer programs written in C and C++
 Suite contains over 3 million lines of code
 Fifth generation of processor intensive suites from SPEC

79
Cont’d…
Table 1.8 SPEC CPU2006
Integer Benchmarks

80
Cont’d…
Table 1.9 SPEC CPU2006
Floating-Point Benchmarks

81
Cont’d…
Terms Used in SPEC Documentation

Benchmark  Peak metric

A program written in a high-level language that 
This enables users to attempt to optimize system
can be compiled and executed on any computer performance by optimizing the compiler output
that implements the compiler  Speed metric

System under test 
This is simply a measurement of the time it takes to

This is the system to be evaluated execute a compiled benchmark

Used for comparing the ability of a computer to

Reference machine
complete single tasks

This is a system used by SPEC to establish a  Rate metric
baseline performance for all benchmarks

This is a measurement of how many tasks a computer

Each benchmark is run and measured on this can accomplish in a certain amount of time
machine to establish a reference time for that
benchmark

This is called a throughput, capacity, or rate measure

Base metric

Allows the system under test to execute simultaneous
tasks to take advantage of multiple processors

These are required for all reported results and
have strict guidelines for compilation
82
Cont’d…

Figure 1.20 SPEC Evaluation Flowchart


83
Cont’d…
Table 1.10 Some SPEC CINT2006 Results
(a) Sun Blade 1000
Cont’d…
1.3 Computer system
A Top-Level View of Computer Function and Interconnection

Computer Components

Computer Function

Instruction Fetch and Execute

Interrupts

I/O Function

Interconnection Structures

Bus Interconnection

86
Cont’d…
 At a top level, a computer consists of CPU (central processing unit),
memory, and I/O components, with one or more modules of each type.

These components are interconnected in some fashion to achieve the basic
function of the computer, which is to execute programs.

Thus, at a top level, we can characterize a computer system by describing
1.The external behavior of each component, that is, the data and control
signals that it exchanges with other components, and
2.The interconnection structure and the controls required to manage the
use of the interconnection structure.

87
1.3.1 Computer Components
 Contemporary computer designs are based on concepts developed by John von
Neumann at the Institute for Advanced Studies, Princeton

Referred to as the von Neumann architecture and is based on three key concepts:

Data and instructions are stored in a single read-write memory

The contents of this memory are addressable by location, without regard to the
type of data contained there

Execution occurs in a sequential fashion (unless explicitly modified) from one
instruction to the next

Hardwired program

The result of the process of connecting the various components in the desired
configuration
88
Cont’d…

(a) Programming in hardware

(b) Programming in software

Figure 1. Hardware and Software Approaches


89
Cont’d…
Software

A sequence of codes or instructions

Part of the hardware interprets each instruction and generates control
signals

Provide a new sequence of codes for each new program instead of rewiring
the hardware

90
Cont’d…
Major components:

CPU

Instruction interpreter

Module of general-purpose arithmetic and logic functions

I/O Components

Input module

Contains basic components for accepting data and instructions and
converting them into an internal form of signals usable by the system

Output module

Means of reporting result
91
Memory address register Memory buffer register
(MAR) (MBR) MEMORY
• Specifies the address in • Contains the data to be written
memory for the next read or into memory or receives the
write data read from memory

MAR
I/O address register I/O buffer register
(I/OAR) (I/OBR)
• Specifies a particular I/O device • Used for the exchange of data
between an I/O module and the
CPU

MBR

92
Cont’d…

Figure 1.21 Hardware and Software Approaches 93


1.3.2 Computer Function
 The basic function performed by a computer is execution of a program,
which consists of a set of instructions stored in memory.
 The processor does the actual work by executing instructions specified in
the program.
 Instruction processing consists of two steps:
➔ The processor reads (fetches) instructions from memory one at a time and

executes each instruction.

Figure 1.22 Basic Instruction Cycle 94


Cont’d…
Instruction Fetch and Execute

At the beginning of each instruction cycle the processor fetches an
instruction from memory

The program counter (PC) holds the address of the instruction to be fetched
next

The processor increments the PC after each instruction fetch so that it will
fetch the next instruction in sequence

The fetched instruction is loaded into the instruction register (IR)

The processor interprets the instruction and performs the required action

95
Cont’d…
These actions fall into four categories:

Processor-memory: Data transferred from processor to memory or from
memory to processor.

Processor-I/O: Data transferred to or from a peripheral device by
transferring between the processor and an I/O module.

Data processing: The processor may perform some arithmetic or logic
operation on data.

Control: An instruction may specify that the sequence of execution be
altered.

96
Cont’d…

(a) Instruction format

(b) Integer format

Program counter (PC) = Address of instruction 0001 = Load AC from memory

Instruction register (IR) = Instruction being executed 0010 = Store AC to memory

Accumulator (AC) = Temporary storage 0101 = Add to AC from memory

(c) Internal CPU registers (d) Partial list of opcodes

Figure 1.23 Characteristics of a Hypothetical Machine 97


Cont’d…

Figure 1.24 Example of Program Execution (contents of memory and registers in hexadecimal) 98
Cont’d…

Figure 1.25 Instruction Cycle State Diagram


99
Interrupts
Table 1.11 Classes of Interrupts

100
Cont’d…

(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait
= interrupt occurs during course of execution of user program
Figure 1.25 Program Flow of Control without and with Interrupts 101
Cont’d…

Figure 1.26 Transfer of Control via Interrupts 102


Cont’d…

Figure 1.27 Instruction Cycle with Interrupts


103
Cont’d…

Figure 1.28 Program Timing: Short I/O Wait


104
Cont’d…

Figure 1.29 Program Timing: Short I/O Wait


105
Cont’d…

Figure 1.30 Instruction Cycle State Diagram, with Interrupts


106
Cont’d…

Figure 1.31 Transfer of Control with Multiple Interrupts 107


Cont’d…

Figure 1.32 Example Time Sequence of Multiple Interrupts 108


I/O Function

I/O module can exchange data directly with the processor

Processor can read data from or write data to an I/O module

Processor identifies a specific device that is controlled by a particular I/O
module

I/O instructions rather than memory referencing instructions

In some cases it is desirable to allow I/O exchanges to occur directly with
memory

The processor grants to an I/O module the authority to read from or write to
memory so that the I/O memory transfer can occur without tying up the
processor

The I/O module issues read or write commands to memory relieving the
processor of responsibility for the exchange

This operation is known as direct memory access (DMA)
109
1.3.3 Interconnection Structures

Figure 1.33 Computer Modules 110


Cont’d…
The interconnection structure must support the following types of transfers:

Memory to Processor to I/O to Processor to I/O to or from


processor memory processor I/O memory

An I/O
module is
allowed to
Processor exchange
Processor Processor Processor data directly
reads an reads data
writes a unit sends data to with memory
instruction or a from an I/O
unit of data of data to the I/O without going
device via an through the
from memory memory device
I/O module processor
using direct
memory
access
111
1.3.4 Bus Interconnection

A communication pathway connecting two or more devices

Key characteristic is that it is a shared transmission medium

Signals transmitted by any one device are available for reception by all other devices
attached to the bus

If two devices transmit during the same time period their signals will overlap and
become garbled

Typically consists of multiple communication lines

Each line is capable of transmitting signals representing binary 1 and binary 0

Computer systems contain a number of different buses that provide pathways between
components at various levels of the computer system hierarchy

System bus

A bus that connects major computer components (processor, memory, I/O)

The most common computer interconnection structures are based on the use of one or
more system buses 112
Cont’d…
Data Bus
 Data lines that provide a path for moving data among system modules
 May consist of 32, 64, 128, or more separate lines
 The number of lines is referred to as the width of the data bus
 The number of lines determines how many bits can be transferred at a time
 The width of the data bus is a key factor in determining overall system
performance

113
Address Bus Control Bus

Used to designate the source or  Used to control the access and the use of
destination of the data on the data bus the data and address lines

If the processor wishes to read a word  Because the data and address lines are
of data from memory it puts the shared by all components there must be a
address of the desired word on the means of controlling their use
address lines  Control signals transmit both command

Width determines the maximum possible and timing information among system
memory capacity of the system modules

Also used to address I/O ports  Timing signals indicate the validity of

The higher order bits are used to select data and address information
a particular module on the bus and the  Command signals specify operations to
lower order bits select a memory be performed
location or I/O port within the module
Cont’d…

Figure 1.32 Bus Interconnection Scheme 115


Thank You !

You might also like