Parallel Design and Programming
Von Neumann Architecture •Comprised of four main components: –Memory –Control Unit –Arithmetic Logic Unit –Input / Output •Read/write, random access memory is used to store both program instructions and data –Program instructions are coded data which tell the computer to do something –Data is simply information to be used by the program •Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. •Aritmetic Unit performs basic arithmetic operations •Input/Output is the interface to the human operator Parallel computers still follow this basic design, just multiplied in units. The basic, fundamental architecture remains the same.
Flynn's Classical Taxonomy Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can have only one of two possible states: Single or Multiple. •The matrix below defines the 4 possible classifications according to Flynn:
Single Instruction, Single Data (SISD): 1.A serial (non-parallel) computer 2.Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle 3.Single Data: Only one data stream is being used as input during any one clock cycle 4.Deterministic execution 5.This is the oldest type of computer 6.Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs.
Single Instruction, Multiple Data (SIMD): •A type of parallel computer •Single Instruction: All processing units execute the same instruction at any given clock cycle •Multiple Data: Each processing unit can operate on a different data element •Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. •Synchronous (lockstep) and deterministic execution •Two varieties: Processor Arrays and Vector Pipelines •Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.
Multiple Instruction, Single Data (MISD) •A type of parallel computer •Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. •Single Data: A single data stream is fed into multiple processing units. •Few (if any) actual examples of this class of parallel computer have ever existed. •Some conceivable uses might be: –multiple cryptography algorithms attempting to crack a single coded message.
Multiple Instruction, Multiple Data (MIMD): •A type of parallel computer •Multiple Instruction: Every processor may be executing a different instruction stream •Multiple Data: Every processor may be working with a different data stream •Execution can be synchronous or asynchronous, deterministic or non-deterministic •Currently, the most common type of parallel computer - most modern supercomputers fall into this category. •Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. •Note: many MIMD architectures also include SIMD execution sub-components
Some General Parallel Terminology •Supercomputing / High Performance Computing (HPC) :Using the world's fastest and largest computers to solve large problems. •Node : A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer. •CPU / Socket / Processor / Core : CPU (Central Processing Unit) was a singular execution component for a computer. Multiple CPUs were incorporated into a node. Individual CPUs were subdivided into multiple "cores", each being a unique execution unit. CPUs with multiple cores are sometimes called "sockets" - vendor dependent. The result is a node with multiple CPUs, each containing multiple cores.
•Task : A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors. •Pipelining : Breaking a task into steps performed by different processor units, with inputs streaming through, much like an assembly line; a type of parallel computing. •Shared Memory : From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. •Distributed Memory : In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing •Symmetric Multi-Processor (SMP) : Shared memory hardware architecture where multiple processors share a single address space and have equal access to all resources.
•Synchronization : The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. •Massively Parallel : Refers to the hardware that comprises a given parallel system - having many processing elements. The meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of processing elements numbering in the hundreds of thousands to millions. •Embarrassingly Parallel : Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks. •Scalability : Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more resources. Factors that contribute to scalability include: –Hardware - particularly memory-cpu bandwidths and network communication properties –Application algorithm –Parallel overhead related –Characteristics of your specific application
Costs of Parallel Programming •Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: •If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup). •If all of the code is parallelized, P = 1 and the speedup is infinite (in theory). •If 50% of the code can be parallelized, maximum speedup = 2, meaning the code will run twice as fast. •Introducing the number of processors performing the parallel fraction of work, the relationship can be modeled by: law= 1/(1-p) 1 speedup = -------------- P --- + S N where P = parallel fraction, N = number of processors and S = serial fraction.

Lec 2 (parallel design and programming)

  • 1.
  • 2.
    Von Neumann Architecture •Comprisedof four main components: –Memory –Control Unit –Arithmetic Logic Unit –Input / Output •Read/write, random access memory is used to store both program instructions and data –Program instructions are coded data which tell the computer to do something –Data is simply information to be used by the program •Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. •Aritmetic Unit performs basic arithmetic operations •Input/Output is the interface to the human operator Parallel computers still follow this basic design, just multiplied in units. The basic, fundamental architecture remains the same.
  • 3.
    Flynn's Classical Taxonomy Flynn'staxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream. Each of these dimensions can have only one of two possible states: Single or Multiple. •The matrix below defines the 4 possible classifications according to Flynn:
  • 4.
    Single Instruction, SingleData (SISD): 1.A serial (non-parallel) computer 2.Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle 3.Single Data: Only one data stream is being used as input during any one clock cycle 4.Deterministic execution 5.This is the oldest type of computer 6.Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs.
  • 5.
    Single Instruction, MultipleData (SIMD): •A type of parallel computer •Single Instruction: All processing units execute the same instruction at any given clock cycle •Multiple Data: Each processing unit can operate on a different data element •Best suited for specialized problems characterized by a high degree of regularity, such as graphics/image processing. •Synchronous (lockstep) and deterministic execution •Two varieties: Processor Arrays and Vector Pipelines •Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD instructions and execution units.
  • 6.
    Multiple Instruction, SingleData (MISD) •A type of parallel computer •Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. •Single Data: A single data stream is fed into multiple processing units. •Few (if any) actual examples of this class of parallel computer have ever existed. •Some conceivable uses might be: –multiple cryptography algorithms attempting to crack a single coded message.
  • 7.
    Multiple Instruction, MultipleData (MIMD): •A type of parallel computer •Multiple Instruction: Every processor may be executing a different instruction stream •Multiple Data: Every processor may be working with a different data stream •Execution can be synchronous or asynchronous, deterministic or non-deterministic •Currently, the most common type of parallel computer - most modern supercomputers fall into this category. •Examples: most current supercomputers, networked parallel computer clusters and "grids", multi-processor SMP computers, multi-core PCs. •Note: many MIMD architectures also include SIMD execution sub-components
  • 8.
    Some General ParallelTerminology •Supercomputing / High Performance Computing (HPC) :Using the world's fastest and largest computers to solve large problems. •Node : A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer. •CPU / Socket / Processor / Core : CPU (Central Processing Unit) was a singular execution component for a computer. Multiple CPUs were incorporated into a node. Individual CPUs were subdivided into multiple "cores", each being a unique execution unit. CPUs with multiple cores are sometimes called "sockets" - vendor dependent. The result is a node with multiple CPUs, each containing multiple cores.
  • 9.
    •Task : Alogically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors. •Pipelining : Breaking a task into steps performed by different processor units, with inputs streaming through, much like an assembly line; a type of parallel computing. •Shared Memory : From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. •Distributed Memory : In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing •Symmetric Multi-Processor (SMP) : Shared memory hardware architecture where multiple processors share a single address space and have equal access to all resources.
  • 10.
    •Synchronization : Thecoordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. •Massively Parallel : Refers to the hardware that comprises a given parallel system - having many processing elements. The meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of processing elements numbering in the hundreds of thousands to millions. •Embarrassingly Parallel : Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks. •Scalability : Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more resources. Factors that contribute to scalability include: –Hardware - particularly memory-cpu bandwidths and network communication properties –Application algorithm –Parallel overhead related –Characteristics of your specific application
  • 11.
    Costs of ParallelProgramming •Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: •If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup). •If all of the code is parallelized, P = 1 and the speedup is infinite (in theory). •If 50% of the code can be parallelized, maximum speedup = 2, meaning the code will run twice as fast. •Introducing the number of processors performing the parallel fraction of work, the relationship can be modeled by: law= 1/(1-p) 1 speedup = -------------- P --- + S N where P = parallel fraction, N = number of processors and S = serial fraction.