COMPUTER ORGANIZATION AND DESIGN ARM
Edition
The Hardware/Software Interface
EE360
EMBEDDED SYSTEMS
Slides stolen copied adapted from the original slides provided in the
instructor resources for the text
About us
• Ribhu ribhufec@iitg.ac.in
____________________
• Akankhya Sarmah akank174102022@iitg.ac.in
• Biplab Sengyung sengy174102050@iitg.ac.in
• Soumendu Ghosh ghosh174102019@iitg.ac.in
Chapter 1 — Computer Abstractions and Technology — 2
Book(s)
• Computer organization
and design by Patterson
and Hennessy.
• Indian Edition not
available
• http://staff.ustc.edu.cn/~
llxx/cod/reference_book
s_tools/Computer%20Or
ganization%20and%20D
esign%20ARM%20editio
n.pdf
• We must thank USTC
China
• Do NOT host this.
Chapter 1 — Computer Abstractions and Technology — 3
Book(s)
• Computer Organization and Design: The
Hardware/Software Interface, 4th ed. (With CD-ROM)
ARM Edition Paperback – 2009
• Available in the Institute library and seen in the Core-I
bookstore.
Chapter 1 — Computer Abstractions and Technology — 4
Scoring
End Term Mid Term Quiz Project
15%
10%
45%
30%
Chapter 1 — Computer Abstractions and Technology — 5
Class Schedule
• Thursdays : Double Lectures
• No classes from 07/02/2019 to 18/02/2019
• Quiz 1 on 07/02/2019 during regular class hours
• Quiz 2 to be announced
• Randomly sampled attendance
Chapter 1 — Computer Abstractions and Technology — 6
About the Project
• Can be a coding project or a history project
• 3 members per group.
• Find your own problems based on the hints in class and
ensure their uniqueness.
• Since the project carries a 15% weight, you are expected
to spend 15-20 hours on it.
• 15 minutes for evaluation
• History projects : Term paper on the history of a
particular aspect of computer architecture
• History projects require a power point presentation and
a written report.
• The correctness of grammar will also be evaluated in this
case.
Chapter 1 — Computer Abstractions and Technology — 7
More about the project
• Coding projects need to be well commented and
should run
• Coding projects need to be in ARMv8.
• Complexity as well as implementation count
• To be presented to the CI and TAs.
• The last date is fixed
• No project presented after the last date will be
evaluated.
Chapter 1 — Computer Abstractions and Technology — 8
COMPUTER ORGANIZATION AND DESIGN ARM
Edition
The Hardware/Software Interface
CHAPTER 1
Computer Abstractions and Technology
The Three Great Revolutions
Agricultural
Industrial
Computer
Chapter 1 — Computer Abstractions and Technology — 11
§1.1 Introduction
The Computer Revolution
• Progress in computer technology
• Underpinned by Moore’s Law
• Every 10 fold decrease in the costs opens up new
applications
• Makes novel applications feasible
• Computers in automobiles
• Cell phones/ Smart Phones
• Human genome project
• World Wide Web
• Computers are pervasive
Chapter 1 — Computer Abstractions and
Technology — 12
Classes of Computers
• Personal computers
• General purpose, variety of software
• Subject to cost/performance tradeoff
• Server computers
• Network based
• High capacity, performance, reliability
• Range from small servers to building sized
Chapter 1 — Computer Abstractions and
Technology — 13
Classes of Computers
• Supercomputers
• High-end scientific and
engineering calculations
• Highest capability
• Small number fraction
• Embedded computers
• Hidden as components of
systems
• Stringent
power/performance/cost
constraints
• Computers as components
• You don’t even know that you
are using a computer
Chapter 1 — Computer Abstractions and
Technology — 14
The PostPC Era
Cell Phones
Tablets
Chapter 1 — Computer Abstractions and
Technology — 15
The Post-PC Era
Personal Mobile Device (PMD)
• Battery operated
• Connects to the Internet
• A Few thousand rupees
• IoT
• Smart phones, tablets, electronic glasses
Cloud computing
• Warehouse Scale Computers (WSC)
• Software as a Service (SaaS)
• Portion of software run on a PMD and a portion run in the Cloud
• Amazon and Google
What You Will Learn?
• How are programs translated into the machine
language?
• And how the hardware executes them?
• The hardware/software interface
• What determines program performance
• And how it can be improved
• How hardware designers improve performance
• What is parallel processing
Chapter 1 — Computer Abstractions and
Technology — 17
What Decides the performance
of a computer system?
Algorithm
• Determines number of operations executed
• DFT/FFT
Programming language, compiler, architecture
• Determine number of machine instructions executed per operation
Processor and memory system
• Determine how fast instructions are executed
I/O system (including OS)
• Determines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and
Technology — 18
§1.2 Eight Great Ideas in Computer Architecture
Eight Great Ideas
Design for Moore’s Law
Use abstraction to simplify design
Make the common case fast
Performance via parallelism
Performance via pipelining
Performance via prediction
Hierarchy of memories
Dependability via redundancy
Chapter 1 — Computer Abstractions and
Technology — 19
§1.3 Below Your Program
The Layers of Abstraction
Application software
• Written in high-level
language
System software
• Compiler: translates HLL code
to machine code
• Operating System: service code
• Handling input/output
• Managing memory and
storage
• Scheduling tasks & sharing
resources
Hardware
• Processor, memory, I/O
controllers
Chapter 1 — Computer Abstractions and
Technology — 20
Breakthrough
• You can write a program to convert a high level
program to machine language
• Example :
Chapter 1 — Computer Abstractions and Technology — 21
Levels of Program Code
• High-level language
• Level of abstraction closer to
problem domain
• Provides for productivity and
portability
• Assembly language
• Textual representation of
instructions
• Hardware representation
• Binary digits (bits)
• Encoded instructions and data
Chapter 1 — Computer Abstractions and
Technology — 22
§1.4 Under the Covers
Components of a Computer
•Same components for
The BIG Picture
all kinds of computer
• Desktop, server,
embedded
•Input/output includes
• User-interface devices
• Display, keyboard, mouse
• Storage devices
• Hard disk, CD/DVD, flash
• Network adapters
• For communicating with other
computers
Chapter 1 — Computer Abstractions and
Technology — 23
Chapter 1 — Computer Abstractions and Technology — 24
Through the Looking Glass
• LCD screen: picture elements (pixels)
• Mirrors content of frame buffer memory
Chapter 1 — Computer Abstractions and
Technology — 25
Touchscreen
•PostPC device
•Supersedes keyboard
and mouse
•Resistive and
Capacitive types
• Most tablets, smart
phones use capacitive
• Capacitive allows
multiple touches
simultaneously
Chapter 1 — Computer Abstractions and
Technology — 26
Inside an iPad 2 Capacitive multitouch
LCD screen
3.8 V, 25 Watt-hour
battery
Computer
board
Chapter 1 — Computer Abstractions and
Technology — 27
Inside the Processor (CPU)
•Datapath: performs operations on
data
•Control: sequences datapath,
memory, ...
•Cache memory
•Small fast SRAM memory for immediate
access to data
Chapter 1 — Computer Abstractions and
Technology — 28
Inside Apple A5
Chapter 1 — Computer
Abstractions and Technology
— 29
Abstractions
The BIG Picture
•Abstraction helps us deal with complexity
•Hide lower-level detail
•The hardware software interface is called the
instruction set architecture (ISA)
•Application binary interface
•The ISA plus system software interface
•Implementation
•The details underlying and interface
Chapter 1 — Computer Abstractions and
Technology — 30
Memory
•Volatile main memory
•Loses instructions and data when
power off
•Non-volatile secondary
memory
•Magnetic disk
•Flash memory (Limited
Read/Write Capability)
Chapter 1 — Computer Abstractions and
Technology — 31
Networks
•Communication, resource sharing, nonlocal
access
•Local area network (LAN): Ethernet
•Wide area network (WAN): the Internet
•Wireless network: WiFi, Bluetooth
Chapter 1 — Computer Abstractions and
Technology — 32
§1.5 Technologies for Building Processors and Memory
Technology Trends
DRAM Capacity
Chapter 1 — Computer Abstractions and
Technology — 33
§1.5 Technologies for Building Processors and Memory
Technology Trends
Year Technology Relative
performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 IC 900
1995 VLSI 2,400,000
2013 ULSI 250,000,000,000
Semiconductor Technology
•Silicon:semiconductor
•Add materials to transform
properties:
•Conductors
•Insulators
•Switch
Chapter 1 — Computer Abstractions and
Technology — 35
Manufacturing ICs
8”-12” Dia
12”-24” Length 1”
•Yield: proportion of working dies per wafer
Intel Core i7 (2012) Wafer
• 300mm wafer, 280 chips, 32nm technology
• Each chip is 20.7 x 10.5 mm
Chapter 1 — Computer Abstractions and
Technology — 37
Integrated Circuit Cost
Cost per wafer
Cost per die =
Dies per wafer × Yield
Dies per wafer ≈ Wafer areaΤDie area
Yield
1
=
(1 + (Defects per 𝑢𝑛𝑖𝑡 area × Die area/2))2
Chapter 1 — Computer Abstractions and
Technology — 38
Integrated Circuit Cost
•Nonlinear relation to area and defect rate
•Wafer cost and area are fixed
•Defect rate determined by manufacturing process
•Die area determined by architecture and circuit
design
•The third equation is emperical
Chapter 1 — Computer Abstractions and
Technology — 39
§1.6 Performance
Defining Performance
•Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50
0 200 400 600 0 5000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas DC- Douglas DC-
8-50 8-50
0 500 1000 1500 0 200000 400000
Cruising Speed (mph) Passengers x mph
Chapter 1 — Computer Abstractions and
Technology — 40
Response Time and Throughput
•Response time
• How long it takes to do a task
•Throughput
• Total work done per unit time
• e.g., tasks/transactions/… per hour
•How are response time and throughput
affected by
• Replacing the processor
with a faster version?
• Adding more processors?
•We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and
Technology — 41
Relative Performance
•Define Performance = 1/Execution Time
•“X is n time faster than Y”
PerformanceX ΤPerformanceY
= Execution timeY ΤExecution timeX = 𝑛
Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and
Technology — 42
Measuring Execution Time
•Elapsed time
•Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
•Determines system performance
•CPU time
•Time spent processing a given job
• Discounts I/O time, other jobs’ shares
•Comprises user CPU time and system CPU
time
•Determines CPU performance
Chapter 1 — Computer Abstractions and
Technology — 43
CPU Clocking
• Operation of digital hardware governed by a constant-
rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×10 9Hz
Chapter 1 — Computer Abstractions and
Technology — 44
CPU Time
CPU Time CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
Clock Rate
•Performance improved by
• Reducing number of clock cycles
• Increasing clock rate
• Hardware designer must often trade off clock rate
against cycle count
Chapter 1 — Computer Abstractions and
Technology — 45
CPU Time Example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
Chapter 1 — Computer Abstractions and
Technology — 46
Chapter 1 — Computer Abstractions and Technology — 47
CPU Time Example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
Clock Cycles B 1.2 Clock Cycles A
Clock Rate B
CPU Time B 6s
Clock Cycles A CPU Time A Clock Rate A
10s 2GHz 20 109
1.2 20 109 24 109
Clock Rate B 4GHz
6s 6s
Chapter 1 — Computer Abstractions and
Technology — 48
Instruction Count and CPI
Clock Cycles Instructio n Count Cycles per Instructio n
CPU Time Instructio n Count CPI Clock Cycle Time
Instructio n Count CPI
Clock Rate
• Instruction Count for a program
• Determined by program, ISA and compiler
• Average cycles per instruction
• Determined by CPU hardware
• If different instructions have different CPI
• Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and
Technology — 49
Example
• Computer A: Cycle Time = 250ps, CPI = 2.0
• Computer B: Cycle Time = 500ps, CPI = 1.2
• Same Architechture
• Which is faster, and by how much?
CPU Time Instructio n Count CPI Cycle Time
A A A
I 2.0 250ps I 500ps
CPU Time Instructio n Count CPI Cycle Time
B B B
I 1.2 500ps I 600ps
B I 600ps 1.2
CPU Time
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and
Technology — 50
More CPI
• If different instruction classes take different numbers of cycles
n
Clock Cycles (CPIi Instruction Count i )
i1
Weighted average CPI
Clock Cycles n
Instructio n Count i
CPI CPIi
Instructio n Count i1 Instructio n Count
Relative frequency
Chapter 1 — Computer Abstractions and
Technology — 51
Example(Algorithm Design)
• Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Chapter 1 — Computer Abstractions and
Technology — 52
Chapter 1 — Computer Abstractions and Technology — 53
Example(Algorithm Design)
Sequence 1: IC = 5
Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
Avg. CPI = 10/5 = 2.0
Sequence 2: IC = 6
Clock Cycles
= 4×1 + 1×2 + 1×3
=9
Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and
Technology — 54
Performance Summary
The BIG Picture
Instructions Clock cycles Seconds
CPU Time
Program Instruction Clock cycle
• Performance depends on
• Algorithm: affects IC, possibly CPI (IC is direct, however we
may prefer lighter instructions over the heavier ones =m, FFT
has a lower CPI than DFT)
• Programming language: affects IC, CPI
• Compiler: affects IC, CPI
• Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and
Technology — 55
Example
• A Java application takes 15s to run. A new
compiler requiring 0.6 as many instructions as
the old one is released, while increasing the CPI
to 1.1 times. What is the time required on the
new complier ?
Chapter 1 — Computer Abstractions and Technology — 56
§1.7 The Power Wall
Power Trends
Chapter 1 — Computer Abstractions and
Technology — 57
§1.7 The Power Wall
Power Trends
• For CMOS circuits power is consumed in capacitative
loading while switching.
1 2
𝐸 = 𝐶𝑉
2
1 2
𝑃 = 𝐶𝑉 𝑓
2
• Leakage currents
Power (1 / 2)Capacitive load Voltage 2 Frequency
×30
5V → 1V ×1000
Chapter 1 — Computer Abstractions and
Technology — 58
Reducing Power
• Suppose a new CPU has
• 85% of capacitive load of old CPU
• 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85
0.85 4
0.52
Cold Vold Fold
2
Pold
The power wall
We can’t reduce voltage further
We can’t remove more heat
How else can we improve performance?
Chapter 1 — Computer Abstractions and
Technology — 59
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
Constrained by power, instruction-level parallelism,
memory latency
Chapter 1 — Computer Abstractions and
Technology — 60
Multiprocessors
• Multicore microprocessors
• More than one processor per chip
• Requires explicitly parallel programming
• Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
• Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
More on this in the following chapters.
Chapter 1 — Computer Abstractions and
Technology — 61
SPEC CPU Benchmark
• Programs used to measure performance
• Supposedly typical of actual workload
• Standard Performance Evaluation Corp (SPEC)
• Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006
• Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance
• Normalize relative to reference machine
• Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i1
i
Chapter 1 — Computer Abstractions and
Technology — 62
CINT2006 for Intel Core i7 920
Chapter 1 — Computer Abstractions and
Technology — 63
SPEC Power Benchmark
• Power consumption of server at different workload levels
• Performance: ssj_ops/sec
• Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt ssj_ops i poweri
i 0 i 0
Chapter 1 — Computer Abstractions and
Technology — 64
SPECpower_ssj2008 for Xeon X5650
Chapter 1 — Computer Abstractions and
Technology — 65
§1.10 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
• Improving an aspect of a computer and expecting a
proportional improvement in overall performance
Taf f ected
Timprov ed Tunaf f ected
improvemen t factor
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to
get 5× overall?
80 Can’t be done!
20 20
n
Corollary: make the common case fast
Chapter 1 — Computer Abstractions and
Technology — 66
Fallacy: Low Power at Idle
• Look back at i7 power benchmark
• At 100% load: 258W
• At 50% load: 170W (66%)
• At 10% load: 121W (47%)
• Google data center
• Mostly operates at 10% – 50% load
• At 100% load less than 1% of the time
• Consider designing processors to make power
proportional to load
Chapter 1 — Computer Abstractions and
Technology — 67
Pitfall: MIPS as a Performance Metric
• MIPS: Millions of Instructions Per Second
• Doesn’t account for
• Differences in ISAs between computers
• Differences in complexity between instructions
Instructio n count
MIPS
Execution time 10 6
Instructio n count Clock rate
Instructio n count CPI CPI 10 6
10 6
Clock rate
CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and
Technology — 68
§1.9 Concluding Remarks
Concluding Remarks
• Cost/performance is improving
• Due to underlying technology development
• Hierarchical layers of abstraction
• In both hardware and software
• Instruction set architecture
• The hardware/software interface
• Execution time: the best performance measure
• Power is a limiting factor
• Use parallelism to improve performance
Chapter 1 — Computer Abstractions and
Technology — 69
Practice Problems
• 1.2, 1.3, 1.4, 1.5, 1.7-15
Chapter 1 — Computer Abstractions and Technology — 70