0% found this document useful (0 votes)

30 views16 pages

Lecture # 21

The lecture discusses key concepts in parallel and distributed computing, including FLOPS, Amdahl's Law, and the complexities of parallel applications. It highlights the importance of speed-up calculations and the trade-offs between execution time and resource requirements in parallel programming. Additionally, it addresses issues of portability and the challenges posed by complexity in software development for parallel systems.

Uploaded by

pivoke4989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views16 pages

Lecture # 21

Uploaded by

pivoke4989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

PARALLEL & DISTRIBUTED COMPUTING CS469

LECTURE # 21

Faizan ul Mustafa
Lecturer | Dept. of Computer Science
GIFT University Gujranwala, Pakistan
faizanulmustafa@gift.edu.pk

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 1

Table of Contents
 FLOPS, Speed Up Calculation

 Amdahl’s Law

 Complexity, cost of complexity, portability, scalability

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 2

FLOPS

 Floating point Instructions per second.

 Measure of theoretical peak performance

 Sockets are the chip slots of computer hardware on which cores are
installed. Normlly there is one socket of core in a traditional computer. In
cluster computing machines no of sockets are multiple.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 3

 Sockets: This refers to the number of physical CPU sockets on the
system. Each socket typically holds one or two central processing units
(CPUs).
 Cores: This is the number of CPU cores per socket. A core is an
independent processing unit that can execute instructions.
 Cycles per second: This is the clock speed of the processor, measured in
Hertz (Hz). One Hertz is equal to one cycle per second.
 FLOPs per cycle: This is the number of floating-point operations that a
single core can perform in one clock cycle. This value depends on the
architecture of the processor and the specific instruction being executed.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 4

FLOPS

 Servers are the only computers that sometimes have more than one socket;
for most home computers (desktop or laptop), "sockets" will be 1
 Cores per socket depends on your CPU. It could be 2 (dual-core), 3, 4 (quad-
core), 6 (hexa-core), or 8. There are some prototype CPUs with as many as
80 cores.
 Clock cycles per second" refers to the speed of your CPU. Most modern
CPUs are rated in gigahertz. So 2 GHz would be 2,000,000,000 clock cycles
pr second.
 The number of FLOPs per cycle also depends on the CPU. One of the fastest
(home computer) CPUs is the Intel Core i7-970, capable of 4 double-
precision or 8 single-precision floating-point operations per cycle.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 5

Test

Intel Core i7-970 has 6 cores. If it is running at 3.46 GHz and can
perform 8 floating point operations per second, calculate the
theoretical compute power of this machine.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 6

Solution

 Intel Core i7-970 has 6 cores. If it is running at 3.46 GHz, the formula would
be:
1 (socket) * 6 (cores) * 3,460,000,000 (cycles per second) * S (single-
precision FLOPs per second) = 166,080,000,000 single-precision FLOPs
per second or 83,640,000,000 double-precision FLOPs per second.
 109 FLOPS.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 7

Speed-Up Calculations

 A machine is designed to execute different processes. Any machine that is

capable to execute more processes simultaneously is more efficient. Power
of a machine to run multiple processes in parallel in same time can be
calculated by Speed-Up calculation.
 Speed-Up calculation tells how much theoretically we speedup a particular
process / task in execution.
 Theoretical speed-up calculation is addressed by Amdahl’s Law

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 8

Example

If 30% of the execution time may be the subject of a speedup, p will

be 0.3; if the improvement makes the affected part twice as fast, s
will be 2. Amdahl's law states that the overall speedup of applying
the improvement will be?

?
Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 9
Example2

Assume that we are given a serial task which is split into four
consecutive parts, whose percentages of execution time are p1 =
0.11, p2 = 0.18, p3 = 0.23, and p4 = 0.48 respectively. Then we are
told that the 1st part is not sped up, so s1 = 1, while the 2nd part is
sped up 5 times, so s2 = 5, the 3rd part is sped up 20 times, so s3 =
20, and the 4th part is sped up 1.6 times, so s4 = 1.6. By using
Amdahl's law, the overall speedup is?

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 10

To achieve better overall speedup, one is supposed to speed up the process
that is taking more memory / bigger in magnitude. This can be demonstrated
with the help of following diagram.

Assume that a task has two independent parts, A and B. Part B takes roughly 25% of the
time of the whole computation. By working very hard, one may be able to make this
part 5 times faster, but this reduces the time of the whole computation only slightly. In
contrast, one may need to perform less work to make part A perform twice as fast. This
will make the computation much faster than by optimizing part B, even though part B's
speedup is greater in terms of theul ratio,
Faizan Mustafa |(5 times versus 2 times).
faizanulmustafa@gift.edu.pk 11
Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 12
Complexity

 In general, parallel applications are much more complex than corresponding

serial applications.
 Not only do you have multiple instruction streams executing at the same
time, but you also have data flowing between them.
 The costs of complexity are measured in programmer time in virtually every
aspect of the software development cycle.
 Design
 Coding
 Debugging
 Maintenance

 Adhering to "good" software development practices is essential when

working with parallel applications

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 13

Portability

 Thanks to standardization in several APIs, such as MPI, POSIX threads, and

OpenMP, portability issues with parallel programs are not as serious as in
years past.
 All of the usual portability issues associated with serial programs apply to
parallel programs. For example, if you use vendor "enhancements' to
Fortran, C or C++, portability will be a problem.
 Even though standards exist for several APIs, implementations will differ in
a number of details, sometimes to the point of requiring code modifications
in order to effect portability.
 Operating systems can play a key role in code portability issues.
 Hardware architectures are characteristically highly variable and can affect
portability.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 14

Resource Requirement

 The primary intent of parallel programming is to decrease execution wall

clock time, however in order to accomplish this, more CPU time is required.
For example, a parallel code that runs in 1 hour on 8 processors actually
uses 8 hours of CPU time.
 The amount of memory required can be greater for parallel codes than
serial codes, due to the need to replicate data and for overheads associated
with parallel support libraries and subsystems.
 For short running parallel programs, there can actually be a decrease in
performance compared to a similar serial implementation. The overhead
costs associated with setting up the parallel environment, task creation,
communications and task termination can comprise a significant portion of
the total execution time for short runs.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 15

Thank You

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 16

Amdahl's Law & Parallel Computing
No ratings yet
Amdahl's Law & Parallel Computing
19 pages
Lecture 4 Amdahl Law 1
No ratings yet
Lecture 4 Amdahl Law 1
22 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Parallel Computing Metrics
No ratings yet
Parallel Computing Metrics
11 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Unit 1 - Part 3
No ratings yet
Unit 1 - Part 3
17 pages
Unit 4 HPC Part2
No ratings yet
Unit 4 HPC Part2
18 pages
Lect 02
No ratings yet
Lect 02
51 pages
Lecture # 03 - Parallel Computing
No ratings yet
Lecture # 03 - Parallel Computing
15 pages
High Performance Computing Syllabus
No ratings yet
High Performance Computing Syllabus
35 pages
PC 1
No ratings yet
PC 1
53 pages
Week 7
No ratings yet
Week 7
27 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
Analytical Models in Parallel Computing
No ratings yet
Analytical Models in Parallel Computing
82 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Week 7
No ratings yet
Week 7
27 pages
Unit2 ACA
No ratings yet
Unit2 ACA
14 pages
Multicore02 2
No ratings yet
Multicore02 2
18 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Performance&Scalability Ch3
No ratings yet
Performance&Scalability Ch3
41 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Module 1 Chapter3
No ratings yet
Module 1 Chapter3
45 pages
Chapter 3: Principles of Scalable Performance
No ratings yet
Chapter 3: Principles of Scalable Performance
41 pages
QNS. Parallel Computing
No ratings yet
QNS. Parallel Computing
44 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
ch4 PC
No ratings yet
ch4 PC
76 pages
Parallelism
No ratings yet
Parallelism
67 pages
Parallel Programming Course Overview
No ratings yet
Parallel Programming Course Overview
36 pages
OOAD
No ratings yet
OOAD
67 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
Parallel Programming Insights
No ratings yet
Parallel Programming Insights
32 pages
01-Parallel Computing
No ratings yet
01-Parallel Computing
7 pages
Amdahl's Law & Parallel Processing
No ratings yet
Amdahl's Law & Parallel Processing
16 pages
Parallel Computing Performance Metrics
No ratings yet
Parallel Computing Performance Metrics
69 pages
Parallel Computing-Module2 Notes
No ratings yet
Parallel Computing-Module2 Notes
10 pages
Cours 2
No ratings yet
Cours 2
25 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Principles of Computer Architecture-Assignment 1
No ratings yet
Principles of Computer Architecture-Assignment 1
11 pages
Analytical Modeling in Parallel Computing
No ratings yet
Analytical Modeling in Parallel Computing
19 pages
Parallel Programming Essentials
No ratings yet
Parallel Programming Essentials
48 pages
Pc7 Performance
No ratings yet
Pc7 Performance
50 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
February 22, 2010
No ratings yet
February 22, 2010
53 pages
Lecture04 PDF
No ratings yet
Lecture04 PDF
27 pages
Nscet E-Learning Presentation: Listen Learn Lead
No ratings yet
Nscet E-Learning Presentation: Listen Learn Lead
51 pages
(MOD) (Update - 5!5!2011) BootMenu - Recovery, - Motorola Defy - XDA Forums
No ratings yet
(MOD) (Update - 5!5!2011) BootMenu - Recovery, - Motorola Defy - XDA Forums
11 pages
Research Statement
No ratings yet
Research Statement
2 pages
27.1.5 Lab - Convert Data Into A Universal Format
No ratings yet
27.1.5 Lab - Convert Data Into A Universal Format
11 pages
Data Recovery and Collection of Evidence
No ratings yet
Data Recovery and Collection of Evidence
14 pages
RAID Firmware Upgrade Guide
No ratings yet
RAID Firmware Upgrade Guide
94 pages
2V0-21.23 Exam - Free Exam Q&as, Page 7 - SecExams
No ratings yet
2V0-21.23 Exam - Free Exam Q&as, Page 7 - SecExams
7 pages
User Manual: 3CX VOIP Client / Soft Phone
No ratings yet
User Manual: 3CX VOIP Client / Soft Phone
21 pages
Unit 4 - VLSI Design - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - VLSI Design - WWW - Rgpvnotes.in
12 pages
dq77kb Techprodspec08 PDF
No ratings yet
dq77kb Techprodspec08 PDF
116 pages
Preventive and Cautionary Tips
No ratings yet
Preventive and Cautionary Tips
29 pages
Advanced OS: Distributed Systems Course
No ratings yet
Advanced OS: Distributed Systems Course
19 pages
Immo Emulator, For WFS3 (Immo3) System: Adaptation
No ratings yet
Immo Emulator, For WFS3 (Immo3) System: Adaptation
2 pages
Ansible for Network Engineers
100% (1)
Ansible for Network Engineers
23 pages
Name RDP
No ratings yet
Name RDP
3 pages
OSY Question Bank (Updated)
No ratings yet
OSY Question Bank (Updated)
8 pages
BA DM 232 01 19 EN Modbus RTU DACb EN
No ratings yet
BA DM 232 01 19 EN Modbus RTU DACb EN
20 pages
Sys Log
No ratings yet
Sys Log
9 pages
CS6551 Computer Networks
No ratings yet
CS6551 Computer Networks
7 pages
Cisco Unifeded Conection
No ratings yet
Cisco Unifeded Conection
401 pages
Citrix Virtual Apps and Desktops 7 Advanced Administration Exam Prep Guide 312
No ratings yet
Citrix Virtual Apps and Desktops 7 Advanced Administration Exam Prep Guide 312
24 pages
20mc9132 - Cloud Computing
No ratings yet
20mc9132 - Cloud Computing
5 pages
Class 4 ICT Worksheet 1
No ratings yet
Class 4 ICT Worksheet 1
2 pages
MT6592 Android Scatter
No ratings yet
MT6592 Android Scatter
4 pages
2、显示屏驱动芯片数据手册 ST77916 - SPEC - V1.0 - 20220830
No ratings yet
2、显示屏驱动芯片数据手册 ST77916 - SPEC - V1.0 - 20220830
264 pages
Duplicate 1734049284103
No ratings yet
Duplicate 1734049284103
292 pages
Getting Started With Esp32 Introduction To Esp32
No ratings yet
Getting Started With Esp32 Introduction To Esp32
7 pages
DD Boost
No ratings yet
DD Boost
19 pages
Computer Fundamentals Notes 1 1
No ratings yet
Computer Fundamentals Notes 1 1
17 pages
Operating System Support: Distributed Systems Course
No ratings yet
Operating System Support: Distributed Systems Course
7 pages
Computer Shortcut Keys
No ratings yet
Computer Shortcut Keys
7 pages

Lecture # 21

Uploaded by

Lecture # 21

Uploaded by

PARALLEL & DISTRIBUTED COMPUTING CS469

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 1

 Complexity, cost of complexity, portability, scalability

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 2

 Floating point Instructions per second.

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 3

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 4

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 5

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 6

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 7

 A machine is designed to execute different processes. Any machine that is

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 8

If 30% of the execution time may be the subject of a speedup, p will

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 10

 In general, parallel applications are much more complex than corresponding

 Adhering to "good" software development practices is essential when

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 13

 Thanks to standardization in several APIs, such as MPI, POSIX threads, and

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 14

 The primary intent of parallel programming is to decrease execution wall

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 15

Faizan ul Mustafa | faizanulmustafa@gift.edu.pk 16

You might also like