0% found this document useful (0 votes)

249 views38 pages

Introduction To Parallel Computing

This document provides an introduction to parallel computing. It discusses that parallel computing involves simultaneously using multiple compute resources to solve a computational problem by breaking it into discrete parts that can be solved concurrently. The compute resources can include multiple processors on a single computer or multiple computers connected by a network. Parallel computing is useful for problems that can be broken into pieces solved simultaneously to reduce the total computation time compared to using a single processor. The document then discusses some key concepts and terminology related to parallel computing architectures.

Uploaded by

Hetvi Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

249 views38 pages

Introduction To Parallel Computing

Uploaded by

Hetvi Shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

08-02-2024

INTRODUCTION
TO PARALLEL
COMPUTING

Page 1

WHAT IS PARALLEL COMPUTING?

• Traditionally, software has been written for serial
computation:
• To be run on a single computer having a single Central Processing
Unit (CPU);
• A problem is broken into a discrete series of instructions.
• Instructions are executed one after another.
• Only one instruction may execute at any moment in time.

Page 2

1
08-02-2024

WHAT IS PARALLEL COMPUTING?

• In the simplest sense, parallel computing is the simultaneous use of multiple

compute resources to solve a computational problem.
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved concurrently
• Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different CPUs

Page 3

PARALLEL COMPUTING:
RESOURCES
• The compute resources can include:
• A single computer with multiple processors;
• A single computer with (multiple) processor(s) and some specialized
computer resources (GPU, …)
• An arbitrary number of computers connected by a network;

Page 4

2
08-02-2024

PARALLEL COMPUTING: THE

COMPUTATIONAL PROBLEM
• The computational problem usually demonstrates
characteristics such as the ability to be:
• Broken apart into discrete pieces of work that can be solved
simultaneously;
• Execute multiple program instructions at any moment in time;
• Solved in less time with multiple compute resources than with a
single compute resource.

Page 5

PARALLEL COMPUTING: WHAT FOR?

• Parallel computing is an evolution of serial computing that

attempts to emulate many complex, interrelated events
happening at the same time, yet within a sequence.

Page 6

3
08-02-2024

THE NEED FOR HIGH PERFORMANCE

COMPUTERS
 Large scale applications in science and engineering rely on fast processing

 Applications requiring high availability rely on parallel and distributed platforms for
redundancy.
 Many of todays’ applications such as weather prediction, aerodynamics and artificial
intelligence are very computationally intensive and require vast amounts of processing
power.

 So, it appears that the only way forward is to use PARALLELISM. The idea here is that if
several operations can be performed simultaneously then the total computation time is
reduced.

Page 7

 Today, commercial applications provide an equal or greater driving force in

the development of faster computers. These applications require the
processing of large amounts of data in sophisticated ways.
 For example:
 Databases, data mining
 Web search engines, web-based business services
 Medical imaging and diagnosis
 Pharmaceutical design
 Management of national and multi-national corporations
 Financial and economic modeling
 Advanced graphics and virtual reality, particularly in the entertainment industry
 Networked video and multi-media technologies
 Collaborative work environments

Page 8

4
08-02-2024

PARALLEL AND DISTRIBUTED COMPUTING

 Parallel computing (processing):

 the use of two or more processors (computers), usually within a single system,
working simultaneously to solve a single problem.

 Distributed computing (processing):

 any computing that involves multiple computers remote from each other that each
have a role in a computation problem or information processing.

 Parallel programming:
 the human process of developing programs that express what computations should be
executed in parallel.

Page 9

 A parallel computer is a multiple-processor computer capable of parallel

processing.
 A supercomputer is a general-purpose computer capable of solving
individual problems at extremely high computational speeds
 Ideally, if a single microcomputer takes time t to solve a problem , n
computers working cooperatively in parallel should take time (t/n) to solve
the same problem……..advantage of parallel computing

Page 10

5
08-02-2024

ADVANTAGES:-
 Time Reduction:-
 With the help of parallel processing, a number of computations can be performed at
once, bringing down the time required to complete a project.

 Complexity :-
 Parallel processing is particularly useful in projects that require complex computations,
such as weather modeling and digital special effects.

 Cheapest Possible Solution Strategy :-

 many workstations cost much less than a large mainframe-style

 Greater reliability :-
 A parallel computer work even if a processor fails….fault tolerance

Page 12

CONCEPTS AND
TERMINOLOGY

Page 17

6
08-02-2024

VON NEUMANN ARCHITECTURE

• For over 40 years, virtually all computers have followed a

common machine model known as the von Neumann
computer. Named after the Hungarian mathematician John
von Neumann.

• A von Neumann computer uses the stored-program

concept. The CPU executes a stored program that specifies
a sequence of read and write operations on the memory.

Page 18

BASIC DESIGN

• Basic design
• Memory is used to store both program and data
instructions
• Program instructions are coded data which tell the
computer to do something
• Data is simply information to be used by the
program
• A central processing unit (CPU) gets
instructions and/or data from memory,
decodes the instructions and then
sequentially performs them.

Page 19

7
08-02-2024

FLYNN'S CLASSICAL TAXONOMY

• There are different ways to classify parallel computers. One

of the more widely used classifications, in use since 1966, is
called Flynn's Taxonomy.
• Flynn's taxonomy distinguishes multi-processor computer
architectures according to how they can be classified along
the two independent dimensions of Instruction and Data.
Each of these dimensions can have only one of two possible
states: Single or Multiple.

Page 20

FLYNN MATRIX

• The matrix below defines the 4 possible classifications

according to Flynn

Page 21

8
08-02-2024

SINGLE INSTRUCTION, SINGLE

DATA (SISD)
• A serial (non-parallel) computer
• Single instruction: only one instruction stream is being
acted on by the CPU during any one clock cycle
• Single data: only one data stream is being used as input
during any one clock cycle
• Deterministic execution
• This is the oldest and until recently, the most prevalent
form of computer
• Examples: most PCs, single CPU workstations and
mainframes

Page 22

SINGLE INSTRUCTION, MULTIPLE DATA (SIMD)

• A type of parallel computer
• Single instruction: All processing units execute the same instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• This type of machine typically has an instruction dispatcher, a very high-bandwidth internal network,
and a very large array of small-capacity instruction units.
• Best suited for specialized problems characterized by a high degree of regularity, such as image
processing.
• Synchronous (lockstep) and deterministic execution
• Examples:
• Processor Arrays: Thinking Machines CM-2, MasPar MP-1 & MP-2, ILLIAC IV
• Vector Pipelines: IBM 9000, Cray X-MP,Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi S820, ETA10
• Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD
instructions and execution units.

Page 23

9
08-02-2024

MULTIPLE INSTRUCTION, SINGLE DATA (MISD)

• A single data stream is fed into multiple processing units.

• Each processing unit operates on the data independently via independent
instruction streams.
• MISD architecture is also known as systolic arrays.
• Systolic arrays are hardware structures built for fast and efficient operation of
regular algorithms that perform the same task with different data at different
time instants.

Page 24

MULTIPLE INSTRUCTION, MULTIPLE DATA (MIMD)

• Currently, the most common type of parallel computer. Most modern
computers fall into this category.
• Multiple Instruction: every processor may be executing a different instruction
stream
• Multiple Data: every processor may be working with a different data stream
• Execution can be synchronous or asynchronous, deterministic or non-
deterministic
• Examples: most current supercomputers, networked parallel computer "grids"
and multi-processor SMP computers, multi-core PCs..

Page 25

10
08-02-2024

Page 26

GENERAL PARALLEL TERMINOLOGIES

• CPU
Contemporary CPUs consist of one or more cores - a distinct execution unit with its own
instruction stream. Cores with a CPU may be organized into one or more sockets - each socket
with its own distinct memory . When a CPU consists of two or more sockets, usually hardware
infrastructure supports memory sharing across sockets.
• Node
A standalone "computer in a box." Usually comprised of multiple CPUs/processors/cores, memory,
network interfaces, etc. Nodes are networked together to comprise a supercomputer.
• Task
A logically discrete section of computational work. A task is typically a program or program-like set
of instructions that is executed by a processor.
• Parallel Task
A task that can be executed by multiple processors safely (yields correct results).
• Serial Execution
Execution of a program sequentially, one statement at a time. In the simplest sense, this is what
happens on a one processor machine. However, virtually all parallel tasks will have sections of a
parallel program that must be executed serially.

Page 27

11
08-02-2024

• Parallel Execution
• Execution of a program by more than one task, with each task being able to
execute the same or different statement at the same moment in time.
• Shared Memory
• From a strictly hardware point of view, describes a computer architecture
where all processors have direct (usually bus based) access to common
physical memory. In a programming sense, it describes a model where parallel
tasks all have the same "picture" of memory and can directly address and
access the same logical memory locations regardless of where the physical
memory actually exists.
• Distributed Memory
• In hardware, refers to network based memory access for physical memory
that is not common. As a programming model, tasks can only logically "see"
local machine memory and must use communications to access memory on
other machines where other tasks are executing.

Page 28

• Communications
• Parallel tasks typically need to exchange data. There are several ways this can
be accomplished, such as through a shared memory bus or over a network,
however the actual event of data exchange is commonly referred to as
communications regardless of the method employed.
• Synchronization
• The coordination of parallel tasks in real time, very often associated with
communications. Often implemented by establishing a synchronization point
within an application where a task may not proceed further until another
task(s) reaches the same or logically equivalent point.
• Synchronization usually involves waiting by at least one task, and can
therefore cause a parallel application's wall clock execution time to increase.

Page 29

12
08-02-2024

• Granularity
• In parallel computing, granularity (or grain size) of a task is a measure of the
amount of work (or computation) which is performed by that task.
• granularity is a quantitative or qualitative measure of the ratio of computation
to communication time.

• In fine-grained parallelism, a program is broken down to a large number of

small tasks. These tasks are assigned individually to many processors.The
amount of work associated with a parallel task is low and the work is evenly
distributed among the processors. Hence, fine-grained parallelism facilitates load
balancing.
• In coarse-grained parallelism, a program is split into large tasks. Due to this,
a large amount of computation takes place in processors.This might result in
load imbalance; wherein certain tasks process the bulk of the data while others
might be idle.

• Granularity is closely tied to the levels of processing

1. Instruction level
2. Loop level
3. Sub-routine level and
4. Program-level

Page 30

• Observed Speedup
• Observed speedup of a code which has been parallelized,
defined as:
wall-clock time of serial execution
wall-clock time of parallel execution
• One of the simplest and most widely used indicators for a
parallel program's performance.

Page 31

13
08-02-2024

• Parallel Overhead
• The amount of time required to coordinate parallel tasks, as opposed to doing
useful work. Parallel overhead can include factors such as:
• Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.
• Task termination time

Page 32

• Scalability
• Refers to a parallel system's (hardware and/or software) ability to demonstrate a
proportionate increase in parallel speedup with the addition of more processors. Factors that
contribute to scalability include:
• Hardware - particularly memory-cpu bandwidths and network communications
• Application algorithm
• Parallel overhead
• Characteristics of your specific application and coding

Page 33

14
08-02-2024

PARALLEL
COMPUTER
MEMORY
ARCHITECTURES

Page 34

MEMORY ARCHITECTURES

• Shared Memory
• Distributed Memory
• Hybrid Distributed-Shared Memory

Page 35

15
08-02-2024

SHARED MEMORY
• Shared memory parallel computers vary widely, but generally have in common the
ability for all processors to access all memory as global address space.

• Multiple processors can operate independently but share the same memory
resources.
• Changes in a memory location effected by one processor are visible to all other
processors.
• Shared memory machines can be divided into two main classes based upon memory
access times: UMA and NUMA.

Page 36

SHARED MEMORY : UMA VS. NUMA

• Uniform Memory Access (UMA):

• Most commonly represented today by Symmetric Multiprocessor (SMP) machines
• Identical processors
• Equal access and access times to memory
• Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one
processor updates a location in shared memory, all the other processors know about
the update. Cache coherency is accomplished at the hardware level.
• Non-Uniform Memory Access (NUMA):
• Often made by physically linking two or more SMPs
• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• Memory access across link is slower
• If cache coherency is maintained, then may also be called CC-NUMA - Cache
Coherent NUMA

Page 37

16
08-02-2024

SHARED MEMORY: PROS AND CONS

• Advantages
• Global address space provides a user-friendly programming perspective to memory
• Data sharing between tasks is both fast and uniform due to the proximity of memory
to CPUs
• Disadvantages:
• Primary disadvantage is the lack of scalability between memory and CPUs. Adding
more CPUs can geometrically increases traffic on the shared memory-CPU path, and
for cache coherent systems, geometrically increase traffic associated with
cache/memory management.
• Programmer responsibility for synchronization constructs that insure "correct" access
of global memory.
• Expense: it becomes increasingly difficult and expensive to design and produce shared
memory machines with ever increasing numbers of processors.

Page 38

DISTRIBUTED MEMORY
• Like shared memory systems, distributed memory systems vary widely but share a
common characteristics. Distributed memory systems require a communication network
to connect inter-processor memory.
• Processors have their own local memory. Memory addresses in one processor do not
map to another processor, so there is no concept of global address space across all
processors.
• Because each processor has its own local memory, it operates independently. Changes it
makes to its local memory have no effect on the memory of other processors. Hence,
the concept of cache coherency does not apply.
• When a processor needs access to data in another processor, it is usually the task of the
programmer to explicitly define how and when data is communicated. Synchronization
between tasks is likewise the programmer's responsibility.

Page 39

17
08-02-2024

DISTRIBUTED MEMORY: PROS AND CONS

• Advantages
• Memory is scalable with number of processors. Increase the number of processors
and the size of memory increases proportionately.
• Each processor can rapidly access its own memory without interference and
without the overhead incurred with trying to maintain cache coherency.
• Cost effectiveness: can use commodity, off-the-shelf processors and networking.
• Disadvantages
• The programmer is responsible for many of the details associated with data
communication between processors.
• It may be difficult to map existing data structures, based on global memory, to this
memory organization.
• Non-uniform memory access (NUMA) times

Page 40

HYBRID DISTRIBUTED-SHARED MEMORY

• The largest and fastest computers in the world today employ both shared and
distributed memory architectures.

• The shared memory component is usually a cache coherent SMP machine.

Processors on a given SMP can address that machine's memory as global.
• The distributed memory component is the networking of multiple SMPs. SMPs
know only about their own memory - not the memory on another SMP.
Therefore, network communications are required to move data from one SMP
to another.
• Current trends seem to indicate that this type of memory architecture will
continue to prevail and increase at the high end of computing for the
foreseeable future.
• Advantages and Disadvantages: whatever is common to both shared and
distributed memory architectures.

Page 42

18
08-02-2024

PARALLEL
PROGRAMMING
MODELS

Page 43

OVERVIEW

• There are several parallel programming models in common

use:
• Shared Memory
• Threads
• Message Passing
• Data Parallel
• Hybrid

• Parallel programming models exist as an abstraction above

hardware and memory architectures.

Page 44

19
08-02-2024

OVERVIEW

• Which model to use is often a combination of what is

available and personal choice. There is no "best" model,
although there certainly are better implementations of
some models over others.
• The following sections describe each of the models
mentioned above, and also discuss some of their actual
implementations.

Page 46

SHARED MEMORY MODEL

• In the shared-memory programming model, tasks share a common

address space, which they read and write asynchronously.
• Various mechanisms such as locks / semaphores may be used to
control access to the shared memory.
• An advantage of this model from the programmer's point of view is
that the notion of data "ownership" is lacking, so there is no need to
specify explicitly the communication of data between tasks.
• An important disadvantage in terms of performance is that it
becomes more difficult to understand and manage data locality.

Page 47

20
08-02-2024

SHARED MEMORY MODEL:

IMPLEMENTATIONS
• On shared memory platforms, the native compilers translate user
program variables into actual memory addresses, which are global.
• No common distributed memory platform implementations
currently exist. It provides a shared memory view of data.

Page 48

THREADS MODEL

• In the threads model of parallel programming, a single process can have

multiple, concurrent execution paths.
• Perhaps the most simple analogy that can be used to describe threads is
the concept of a single program that includes a number of subroutines:
• Threads are commonly associated with shared memory architectures and
operating systems.

Page 49

21
08-02-2024

THREADS MODEL: OPENMP

• OpenMP
• Jointly defined and endorsed by a group of major computer hardware and
software vendors.
• Portable / multi-platform, including Unix and Windows NT platforms
• Available in C/C++ implementations
• Can be very easy and simple to use - provides for "incremental parallelism"

Page 51

Page 52

22
08-02-2024

MESSAGE PASSING MODEL

IMPLEMENTATIONS: MPI
• From a programming perspective, message passing implementations commonly
comprise a library of subroutines that are embedded in source code. The
programmer is responsible for determining all parallelism.
• Historically, a variety of message passing libraries have been available since the
1980s. These implementations differed substantially from each other making it
difficult for programmers to develop portable applications.
• In 1992, the MPI Forum was formed with the primary goal of establishing a
standard interface for message passing implementations.
• Part 1 of the Message Passing Interface (MPI) was released in 1994. Part 2
(MPI-2) was released in 1996. Both MPI specifications are available on the web at
www.mcs.anl.gov/Projects/mpi/standard.html.

Page 54

MESSAGE PASSING MODEL

IMPLEMENTATIONS: MPI
• MPI is now the "de facto" industry standard for message passing, replacing
virtually all other message passing implementations used for production work.
Most, if not all of the popular parallel computing platforms offer at least one
implementation of MPI.A few offer a full implementation of MPI-2.

• For shared memory architectures, MPI implementations usually don't use a

network for task communications. Instead, they use shared memory (memory
copies) for performance reasons.

Page 55

23
08-02-2024

OTHER MODELS

• Other parallel programming models besides those

previously mentioned certainly exist, and will continue to
evolve along with the ever changing world of computer
hardware and software.
• Only three of the more common ones are mentioned here.
• Hybrid
• Single Program Multiple Data
• Multiple Program Multiple Data

Page 59

HYBRID

• In this model, any two or more parallel programming models are

combined.
• Currently, a common example of a hybrid model is the combination of
the message passing model (MPI) with the shared memory model
(OpenMP). This hybrid model lends itself well to the increasingly
common hardware environment of networked SMP machines.
• Another common example of a hybrid model is combining data
parallel with message passing. The data parallel implementations on
distributed memory architectures actually use message passing to
transmit data between tasks, transparently to the programmer.

Page 60

24
08-02-2024

SINGLE PROGRAM MULTIPLE DATA

(SPMD)
• Single Program Multiple Data (SPMD):
• SPMD is actually a "high level" programming model that can be built upon any
combination of the previously mentioned parallel programming models.
• A single program is executed by all tasks simultaneously.
• At any moment in time, tasks can be executing the same or different instructions
within the same program.
• SPMD programs usually have the necessary logic programmed into them to allow
different tasks to branch or conditionally execute only those parts of the
program they are designed to execute. That is, tasks do not necessarily have to
execute the entire program - perhaps only a portion of it.
• All tasks may use different data

Page 61

MULTIPLE PROGRAM MULTIPLE

DATA (MPMD)
• Multiple Program Multiple Data (MPMD):
• Like SPMD, MPMD is actually a "high level" programming model that
can be built upon any combination of the previously mentioned
parallel programming models.
• MPMD applications typically have multiple executable object files
(programs). While the application is being run in parallel, each task can
be executing the same or different program as other tasks.
• All tasks may use different data

Page 62

25
08-02-2024

MULTI CORE SYSTEMS

• A multi-core processor is a single computing component with two or more independent actual CPUs
(called "cores").
• These independent CPUs are capable of executing instructions at the same time hence overall increasing
the speed at which programs can be executed.
• Manufacturers typically integrate the cores onto a single integrated circuit die.
• Processors were originally developed with only one core.
• A dual-core processor has two cores (e.g. Intel Core Duo),
• A quad-core processor contains four cores (e.g. Intel's quad-core processors-i3,i5, and i7)
• A hexa-core processor contains six cores (e.g . Intel Core i7 Extreme Edition 980X)
• An octo-core processor or octa-core processor contains eight cores (e.g. Intel Xeon E7-2820)
• A deca-core processor contains ten cores (e.g Intel Xeon E7-2850

Page 64

PERFORMANCE

Two key goals to be achieved with the design of parallel applications are:
• Performance – the capacity to reduce the time needed to solve a problem as
the computing resources increase
• Scalability – the capacity to increase performance as the size of the problem
increases
The main factors limiting the performance and the scalability of an application can
be divided into:
• Architectural limitations
• Algorithmic limitations

Page 65

26
08-02-2024

PERFORMANCE METRICS FOR

PARALLEL APPLICATIONS
Some of the best known metrics are:
• Speedup
• Efficiency
• Redundancy
• Utilization
There also some laws/metrics that try to explain and assert the potential
performance of a parallel application. The best known are:
• Amdahl’s law
• Gustafson-Barsis’ law

Page 68

SPEEDUP

Speedup is a measure of performance. It measures the ratio between the sequential

execution time and the parallel execution time.

T(1) is the execution time with one processing unit

T(p) is the execution time with p processing units

Page 69

27
08-02-2024

EFFICIENCY

Efficiency is a measure of the usage of the computational capacity. It measures the

ratio between performance and the number of resources available to achieve that
performance.

Page 70

REDUNDANCY

Redundancy measures the increase in the required computation when using more
processing units. It measures the ratio between the number of operations performed
by the parallel execution and by the sequential execution.

Page 71

28
08-02-2024

UTILIZATION

Utilization is a measure of the good use of the computational capacity. It measures

the ratio between the computational capacity used during parallel execution and the
capacity that was available.

Page 72

PIPELINING

Page 73

29
08-02-2024

PIPELINING

• A technique of decomposing a sequential process into

suboperations, with each subprocess being executed in a
partial dedicated segment that operates concurrently with
all other segments.
• A pipeline can be visualized as a collection of processing
segments through which binary information flows.
• The name “pipeline” implies a flow of information analogous
to an industrial assembly line.

Page 74

PIPELINING IS NATURAL!

• Laundry Example
• Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

Page 75

30
08-02-2024

SEQUENTIAL LAUNDRY

6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D
• Sequential laundry takes 6 hours for 4 loads

Page 76

PIPELINED LAUNDRY: START WORK ASAP

6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D
• Pipelined laundry takes 3.5 hours for 4 loads

Page 77

31
08-02-2024

PIPELINING LESSONS
• Pipelining doesn’t help latency of
6 PM 7 8 9 single task, it helps throughput
of entire workload
Time
• Pipeline rate limited by slowest
30 40 40 40 40 20 pipeline stage
T
• Multiple tasks operating
a A
s simultaneously using different
k resources
B • Potential speedup = Number
O
r pipe stages
d C • Unbalanced lengths of pipe
e stages reduces speedup
r
D • Time to “fill” pipeline and time
to “drain” it reduces speedup

Page 78

IDEA OF PIPELINING IN COMPUTER

Page 79

32
08-02-2024

Page 80

Page 81

33
08-02-2024

Page 82

ROLE OF CACHE MEMORY

Page 83

34
08-02-2024

PIPELINE PERFORMANCE

Page 84

PIPELINE PERFORMANCE

Page 85

35
08-02-2024

Page 86

Page 87

36
08-02-2024

Page 88

Page 89

37
08-02-2024

PIPELINING AND PARALLEL PROCESSING

• Pipelining and parallel processing are both techniques used in computer architecture to improve the performance
of processors.

• Pipelining involves breaking down the execution of instructions into a series of stages. Each stage performs a
different part of the instruction, and multiple instructions can be processed simultaneously, each at a different
stage of execution. This allows for more efficient use of the processor's resources and can lead to faster overall
execution of instructions.

• Parallel processing, on the other hand, involves the simultaneous execution of multiple instructions or tasks. This
can be done using multiple processors or processor cores working together to handle different parts of a task at
the same time. Parallel processing can significantly speed up the execution of tasks that can be divided into
independent sub-tasks.

• In summary, pipelining improves the efficiency of processing individual instructions, while parallel processing
improves the overall throughput by executing multiple instructions or tasks simultaneously. Both techniques are
used to enhance the performance of modern computer systems.

Page 90

Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
40 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Lec1 Introduction To Parallel Computing
No ratings yet
Lec1 Introduction To Parallel Computing
40 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Lecture 4
No ratings yet
Lecture 4
27 pages
Lecture Parallel Computing
No ratings yet
Lecture Parallel Computing
6 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Introduction To Computing
No ratings yet
Introduction To Computing
6 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Parallel Computing Varun Patial
No ratings yet
Parallel Computing Varun Patial
41 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
Lecture 2 General Parallelism Terms 1
No ratings yet
Lecture 2 General Parallelism Terms 1
24 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Parallel & Distributed Computing Course Overview
No ratings yet
Parallel & Distributed Computing Course Overview
47 pages
Parallel Computing Concepts Guide
No ratings yet
Parallel Computing Concepts Guide
32 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
43 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
90 pages
Parallel Computing
100% (1)
Parallel Computing
53 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
Parallel 123
No ratings yet
Parallel 123
28 pages
Lecture 2 Computer Architecture Course 2024 1
No ratings yet
Lecture 2 Computer Architecture Course 2024 1
57 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
Pda 2
No ratings yet
Pda 2
105 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Chapter 1 Parallel - Computing General Concepts
No ratings yet
Chapter 1 Parallel - Computing General Concepts
52 pages
Unit 1
No ratings yet
Unit 1
22 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
19 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Parallel Computing Essentials
No ratings yet
Parallel Computing Essentials
40 pages
Topic 1 2024
No ratings yet
Topic 1 2024
41 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Parallel Processor Computing Unit 1
No ratings yet
Parallel Processor Computing Unit 1
10 pages
I Notes
No ratings yet
I Notes
27 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Term Paper Cse 211
No ratings yet
Term Paper Cse 211
20 pages
Parallel Computers
No ratings yet
Parallel Computers
19 pages
BDS Session 2
No ratings yet
BDS Session 2
59 pages
Lecture Week - 2 General Parallelism Terms
No ratings yet
Lecture Week - 2 General Parallelism Terms
24 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Lecture 1 Introduction 1
No ratings yet
Lecture 1 Introduction 1
49 pages
Unit 5
No ratings yet
Unit 5
66 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
AI for Defense: CAID 2020 Conference
No ratings yet
AI for Defense: CAID 2020 Conference
177 pages
AIML 2022 23 PEOs PSOs III and IV Sem Syllabu Corrected.
No ratings yet
AIML 2022 23 PEOs PSOs III and IV Sem Syllabu Corrected.
35 pages
History of Microelectronics
No ratings yet
History of Microelectronics
60 pages
Ch3 - AL Machine Organization
No ratings yet
Ch3 - AL Machine Organization
107 pages
M e Cse
No ratings yet
M e Cse
83 pages
Course Plan-IC615 - Embedded - Systems - July2020
No ratings yet
Course Plan-IC615 - Embedded - Systems - July2020
6 pages
Python vs. C++: High-Performance Computing
No ratings yet
Python vs. C++: High-Performance Computing
4 pages
(Ebook) Computer Performance Issues by Marvin Zelkowitz Ph.D. MS BS. ISBN 9780123748102, 0123748100 PDF Download
100% (3)
(Ebook) Computer Performance Issues by Marvin Zelkowitz Ph.D. MS BS. ISBN 9780123748102, 0123748100 PDF Download
58 pages
Computer System Essentials
No ratings yet
Computer System Essentials
6 pages
CloudSim Runtime Entity Creation Guide
No ratings yet
CloudSim Runtime Entity Creation Guide
8 pages
NX 12.0.1 Release Notes
No ratings yet
NX 12.0.1 Release Notes
142 pages
Module 1-1
No ratings yet
Module 1-1
18 pages
BTP Report Final
No ratings yet
BTP Report Final
40 pages
Cable Reference Installation Methods
No ratings yet
Cable Reference Installation Methods
2 pages
Chapter 4 Processor Fundamentals A Levels
No ratings yet
Chapter 4 Processor Fundamentals A Levels
13 pages
The Evolution of Microprocessors
No ratings yet
The Evolution of Microprocessors
6 pages
ARM Interview Questions
0% (1)
ARM Interview Questions
17 pages
Unit-I Notes PPS
No ratings yet
Unit-I Notes PPS
23 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
24 pages
Multicore Processor
100% (1)
Multicore Processor
23 pages
MP Report v2
No ratings yet
MP Report v2
10 pages
Intel Core2 Datasheet
No ratings yet
Intel Core2 Datasheet
98 pages
Module - 05 CC (Bcs601) Search Creators
100% (2)
Module - 05 CC (Bcs601) Search Creators
35 pages
The Dell EMC PowerMax Family Overview
100% (2)
The Dell EMC PowerMax Family Overview
53 pages
ch4 Updated
No ratings yet
ch4 Updated
34 pages
Question Bank HPC
No ratings yet
Question Bank HPC
4 pages
NVIDIA Jetson TX2 for Edge AI
No ratings yet
NVIDIA Jetson TX2 for Edge AI
2 pages
DSCC Notes
No ratings yet
DSCC Notes
42 pages
CMP 101 Set 6 CPU
No ratings yet
CMP 101 Set 6 CPU
24 pages
Ai DHL
No ratings yet
Ai DHL
45 pages

Introduction To Parallel Computing

Uploaded by

Introduction To Parallel Computing

Uploaded by

08-02-2024

WHAT IS PARALLEL COMPUTING?

WHAT IS PARALLEL COMPUTING?

• In the simplest sense, parallel computing is the simultaneous use of multiple

PARALLEL COMPUTING: THE

PARALLEL COMPUTING: WHAT FOR?

• Parallel computing is an evolution of serial computing that

THE NEED FOR HIGH PERFORMANCE

 Today, commercial applications provide an equal or greater driving force in

PARALLEL AND DISTRIBUTED COMPUTING

 Parallel computing (processing):

 Distributed computing (processing):

 A parallel computer is a multiple-processor computer capable of parallel

 Cheapest Possible Solution Strategy :-

VON NEUMANN ARCHITECTURE

• For over 40 years, virtually all computers have followed a

• A von Neumann computer uses the stored-program

FLYNN'S CLASSICAL TAXONOMY

• There are different ways to classify parallel computers. One

• The matrix below defines the 4 possible classifications

SINGLE INSTRUCTION, SINGLE

SINGLE INSTRUCTION, MULTIPLE DATA (SIMD)

MULTIPLE INSTRUCTION, SINGLE DATA (MISD)

• A single data stream is fed into multiple processing units.

MULTIPLE INSTRUCTION, MULTIPLE DATA (MIMD)

GENERAL PARALLEL TERMINOLOGIES

• In fine-grained parallelism, a program is broken down to a large number of

• Granularity is closely tied to the levels of processing

SHARED MEMORY : UMA VS. NUMA

• Uniform Memory Access (UMA):

SHARED MEMORY: PROS AND CONS

DISTRIBUTED MEMORY: PROS AND CONS

HYBRID DISTRIBUTED-SHARED MEMORY

• The shared memory component is usually a cache coherent SMP machine.

• There are several parallel programming models in common

• Parallel programming models exist as an abstraction above

• Which model to use is often a combination of what is

SHARED MEMORY MODEL

• In the shared-memory programming model, tasks share a common

SHARED MEMORY MODEL:

• In the threads model of parallel programming, a single process can have

THREADS MODEL: OPENMP

MESSAGE PASSING MODEL

MESSAGE PASSING MODEL

• For shared memory architectures, MPI implementations usually don't use a

• Other parallel programming models besides those

• In this model, any two or more parallel programming models are

SINGLE PROGRAM MULTIPLE DATA

MULTIPLE PROGRAM MULTIPLE

MULTI CORE SYSTEMS

PERFORMANCE METRICS FOR

Speedup is a measure of performance. It measures the ratio between the sequential

T(1) is the execution time with one processing unit

Efficiency is a measure of the usage of the computational capacity. It measures the

Utilization is a measure of the good use of the computational capacity. It measures

• A technique of decomposing a sequential process into

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

PIPELINED LAUNDRY: START WORK ASAP

IDEA OF PIPELINING IN COMPUTER

ROLE OF CACHE MEMORY

PIPELINING AND PARALLEL PROCESSING

You might also like