1.
2 Performance
Computer Architecture
What does “Faster” Mean?
Response Time
• Time spent to complete an event
• Also referred to as latency or execution time.
Throughput
• Amount of work done in a given time
• Also referred to as bandwidth
• In general, faster response time means an improvement in throughput
Execution Time and Performance
Quantitatively, execution time is inversely proportional to performance
• A decrease in execution time improves performance
Machine X is n times faster than Y means
𝑃𝑥 𝑡𝑦
𝑛= =
𝑃𝑦 𝑡𝑥
Make the Common Case Fast
A rule of thumb in computer design is to make the event that occurs
more frequently, faster
Design trade-off: favour the frequent case over the infrequent case
In general, this should increase overall performance
Amdahl’s Law
The performance improvement to be gained from using some faster
mode of operations is limited by the fraction of time that faster mode
can be used.
Speedup due to enhancement E
𝐸𝑥𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝐸 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ 𝐸
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 𝐸 = =
𝐸𝑥𝑇𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝐸 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝐸
Factors Affecting the Speedup
The fraction of computation time in the original machine that can be
converted to take advantage of the enhancement
𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 ≤ 1
The improvement gained by the enhanced execution mode, i.e. how
much faster the task would run if the enhanced mode were used for
the entire program.
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 > 1
Applying Amdahl’s Law
𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
𝐸𝑥𝑇𝑖𝑚𝑒𝑛𝑒𝑤 = 𝐸𝑥𝑇𝑖𝑚𝑒𝑜𝑙𝑑 ( 1 − 𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 + )
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
𝐸𝑥𝑡𝑖𝑚𝑒𝑜𝑙𝑑 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙 = =
𝐸𝑥𝑡𝑖𝑚𝑒𝑛𝑒𝑤 𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
1 − 𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 +
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
Using Amdahl’s Law: Example
Suppose that we are considering an enhancement that runs 10 times
faster than the original machine but is only usable 40% of the time.
What if the overall speedup gained by incorporating the enhancement?
Answer:
𝐹𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 = 0.4
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 = 10
1 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙 = 0.4 = = 1.56
0.6+( ) 0.64
10
Processing Speed: CPU Clock
A circuit that generates a signal at regular time intervals or cycles
during which basic CPU tasks are performed
Provides control as to when each step of an instruction is executed.
Clock Cycles
One clock pulse is a burst of current
when clock output is 1
A clock cycle is the interval between
the beginning of a pulse to the
beginning of the next
Measured in Hertz, a unit of
electrical vibrations
1Hz = 1 cycle/second
Locality of Reference
Programs tend to reuse data and instructions used recently
A program may spend 90% of its execution time in only 10% of the
code
• Based in a program’s recent past, one can predict with reasonable accuracy
what instructions and data will be used in the near future.
Two Types of Locality
Temporal Locality
• Recently accessed items are likely to be accessed in the near future.
Spatial Locality
• Items whose addresses (or location) are near one another tend to be
referenced close together in time
Benchmarks
Most accurate
• Real programs
Simpler, but discredited
• Kernels – small, key pieces of real applications
• Toy programs – 100-line code from basic programming assignments (e.g.
quicksort)
• Synthetic benchmarks – fake programs (e.g Dhrystone) trying to match
behaviour of real applications
Why do benchmarks sometimes produce conflicting results?
Reference
• Hennessy, J. L., & Patterson, D. A. (2011). Computer architecture: a
quantitative approach. Elsevier.
• IT220 Computer Organization Lecture Slides (AdDU MSIT Class), Prof.
Ariel Maguyon, AdMU Professor.