Multi-Core Computing
2 3/12/2024 Multi-Core Computer  A multi-core microprocessor is one that combines two or more independent processors into a single package, often a single integrated circuit (IC).  A dual-core device contains two independent microprocessors.  In general, multi-core microprocessors allow a computing device to exhibit some form of thread- level parallelism (TLP) without including multiple microprocessors in separate physical packages.
3 3/12/2024 Major Technology Providers  The latest versions of many architectures use multi-core, including PA- RISC (PA-8800), IBM POWER (POWER7), SPARC (UltraSPARC IV), and various processors from Intel and AMD.  There is some controversy as to whether multiple cores on a chip is the same thing as multiple processors. Major technology providers are divided on this issue.  IBM considers its dual-core POWER4 and POWER5 to be two processors, just packaged together.  Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi- threaded rather than multi-processor chip.  Intel considers their multi-core designs to be a single processor.
4 3/12/2024 Single-core computer
5 3/12/2024 Multi-core architectures  Replicate multiple processor cores on a single die. Core 1 Core 2 Core 3 Core 4 Multi-core CPU chip
6 3/12/2024 Multi-core CPU chip  The cores fit on a single processor socket  Also called CMP (Chip Multi-Processor) c o r e 1 c o r e 2 c o r e 3 c o r e 4
7 3/12/2024 The cores run in parallel c o r e 1 c o r e 2 c o r e 3 c o r e 4 thread 1 thread 2 thread 3 thread 4
8 3/12/2024 Within each core, threads are time-sliced (just like on a uniprocessor) c o r e 1 c o r e 2 c o r e 3 c o r e 4 several threads several threads several threads several threads
9 3/12/2024 Interaction with OS  OS perceives each core as a separate processor  OS scheduler maps threads/processes to different cores  Most major OS support multi-core today
10 3/12/2024 Why multi-core ?  Difficult to make single-core clock frequencies even higher  Many new applications are multithreaded  General trend in computer architecture (shift towards more parallelism)
11 3/12/2024 Instruction-level parallelism  Parallelism at the machine-instruction level  The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc.  Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years
12 3/12/2024 Thread-level parallelism (TLP)  This is parallelism on a more coarser scale  Server can serve each client in a separate thread (Web server, database server)  A computer game can do AI, graphics, and physics in three separate threads  Single-core superscalar processors cannot fully exploit TLP  Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP
13 3/12/2024 General context: Multiprocessors  Multiprocessor is any computer with several processors  SIMD Single instruction, multiple data Modern graphics cards  MIMD Multiple instructions, multiple data Lemieux cluster, Pittsburgh supercomputing center
14 3/12/2024 Multiprocessor memory types  Shared memory: In this model, there is one (large) common shared memory for all processors  Distributed memory: In this model, each processor has its own (small) local memory, and its content is not replicated anywhere else
15 3/12/2024 Multi-core processor is a special kind of a multiprocessor: All processors are on the same chip  Multi-core processors are MIMD: Different cores execute different threads (Multiple Instructions), operating on different parts of memory (Multiple Data).  Multi-core is a shared memory multiprocessor: All cores share the same memory
16 3/12/2024 What applications benefit from multi-core?  Database servers  Web servers (Web commerce)  Telecommuncation markets: 6WINDGate (datapath and control plane)  Multimedia applications  Scientific applications, CAD/CAM  In general, applications with Thread-level parallelism (as opposed to instruction- level parallelism) Each can run on its own core
17 3/12/2024 More examples  Editing a photo while recording a TV show through a digital video recorder  Downloading software while running an anti-virus program  “Anything that can be threaded today will map efficiently to multi-core”  BUT: some applications difficult to parallelize
18 3/12/2024 Simultaneous multithreading (SMT)  Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core  Weaving together multiple “threads” on the same core  Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units
19 3/12/2024 BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Without SMT, only a single thread can run at any given time
20 3/12/2024 Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
21 3/12/2024 SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
22 3/12/2024 But: Can’t simultaneously use the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
23 3/12/2024 SMT not a “true” parallel processor  Enables better threading (e.g. up to 30%)  OS and applications perceive each simultaneous thread as a separate “virtual processor”  The chip has only a single copy of each resource  Compare to multi-core: each core has its own copy of resources
24 3/12/2024 Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 3
25 3/12/2024 BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2 Thread 4 Multi-core: threads can run on separate cores
26 3/12/2024 Combining Multi-core and SMT  Cores can be SMT-enabled (or not)  The different combinations: Single-core, non-SMT: standard uniprocessor Single-core, with SMT Multi-core, non-SMT Multi-core, with SMT:  The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads  Intel calls them “hyper-threads”
27 3/12/2024 SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 Thread 3 Thread 4
28 3/12/2024 Comparison: multi-core vs SMT  Multi-core: Since there are several cores, each is smaller and not as powerful (but also easier to design and manufacture) However, great with thread-level parallelism  SMT Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism
29 3/12/2024 The memory hierarchy  If simultaneous multithreading only: all caches shared  Multi-core chips: L1 caches private L2 caches private in some architectures and shared in others  Memory is always shared
30 3/12/2024  Dual-core Intel Xeon processors  Each core is hyper-threaded  Private L1 caches  Shared L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 hyper-threads
31 3/12/2024 Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
32 3/12/2024 Windows Task Manager core 2 core 1
33 3/12/2024 Advantages /Disadvantages
34 3/12/2024 Advantages  Cache coherency circuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip  Signals between different CPUs travel shorter distances, those signals degrade less  These higher quality signals allow more data to be sent in a given time period since individual signals can be shorter and do not need to be repeated as often  A dual-core processor uses slightly less power than two coupled single-core processors
35 3/12/2024 Disadvantages  Ability of multi-core processors to increase application performance depends on the use of multiple threads within applications.  Most Current video games will run faster on a 3 GHz single-core processor than on a 2GHz dual-core processor (of the same core architecture  Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage.  If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement  If memory bandwidth is not a problem, a 90% improvement can be expected
36 3/12/2024 Conclusion  Multi-core chips an important new trend in computer architecture  Several new multi-core chips in design phases  Parallel programming techniques likely to gain importance
37 3/12/2024 References  http://en.wikipedia.org/wiki/Multi- core_(computing)  www.princeton.edu/~jdonald/research/hyp erthreading/garg_report.pdf  www.cs.cmu.edu/~barbic/multi-core.ppt

multi-core Processor.ppt for IGCSE ICT and Computer Science Students

  • 1.
  • 2.
    2 3/12/2024 Multi-Core Computer  Amulti-core microprocessor is one that combines two or more independent processors into a single package, often a single integrated circuit (IC).  A dual-core device contains two independent microprocessors.  In general, multi-core microprocessors allow a computing device to exhibit some form of thread- level parallelism (TLP) without including multiple microprocessors in separate physical packages.
  • 3.
    3 3/12/2024 Major Technology Providers The latest versions of many architectures use multi-core, including PA- RISC (PA-8800), IBM POWER (POWER7), SPARC (UltraSPARC IV), and various processors from Intel and AMD.  There is some controversy as to whether multiple cores on a chip is the same thing as multiple processors. Major technology providers are divided on this issue.  IBM considers its dual-core POWER4 and POWER5 to be two processors, just packaged together.  Sun Microsystems, in contrast, considers its UltraSPARC IV to be a multi- threaded rather than multi-processor chip.  Intel considers their multi-core designs to be a single processor.
  • 4.
  • 5.
    5 3/12/2024 Multi-core architectures  Replicatemultiple processor cores on a single die. Core 1 Core 2 Core 3 Core 4 Multi-core CPU chip
  • 6.
    6 3/12/2024 Multi-core CPU chip The cores fit on a single processor socket  Also called CMP (Chip Multi-Processor) c o r e 1 c o r e 2 c o r e 3 c o r e 4
  • 7.
    7 3/12/2024 The cores runin parallel c o r e 1 c o r e 2 c o r e 3 c o r e 4 thread 1 thread 2 thread 3 thread 4
  • 8.
    8 3/12/2024 Within each core,threads are time-sliced (just like on a uniprocessor) c o r e 1 c o r e 2 c o r e 3 c o r e 4 several threads several threads several threads several threads
  • 9.
    9 3/12/2024 Interaction with OS OS perceives each core as a separate processor  OS scheduler maps threads/processes to different cores  Most major OS support multi-core today
  • 10.
    10 3/12/2024 Why multi-core ? Difficult to make single-core clock frequencies even higher  Many new applications are multithreaded  General trend in computer architecture (shift towards more parallelism)
  • 11.
    11 3/12/2024 Instruction-level parallelism  Parallelismat the machine-instruction level  The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc.  Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years
  • 12.
    12 3/12/2024 Thread-level parallelism (TLP) This is parallelism on a more coarser scale  Server can serve each client in a separate thread (Web server, database server)  A computer game can do AI, graphics, and physics in three separate threads  Single-core superscalar processors cannot fully exploit TLP  Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP
  • 13.
    13 3/12/2024 General context: Multiprocessors Multiprocessor is any computer with several processors  SIMD Single instruction, multiple data Modern graphics cards  MIMD Multiple instructions, multiple data Lemieux cluster, Pittsburgh supercomputing center
  • 14.
    14 3/12/2024 Multiprocessor memory types Shared memory: In this model, there is one (large) common shared memory for all processors  Distributed memory: In this model, each processor has its own (small) local memory, and its content is not replicated anywhere else
  • 15.
    15 3/12/2024 Multi-core processor isa special kind of a multiprocessor: All processors are on the same chip  Multi-core processors are MIMD: Different cores execute different threads (Multiple Instructions), operating on different parts of memory (Multiple Data).  Multi-core is a shared memory multiprocessor: All cores share the same memory
  • 16.
    16 3/12/2024 What applications benefit frommulti-core?  Database servers  Web servers (Web commerce)  Telecommuncation markets: 6WINDGate (datapath and control plane)  Multimedia applications  Scientific applications, CAD/CAM  In general, applications with Thread-level parallelism (as opposed to instruction- level parallelism) Each can run on its own core
  • 17.
    17 3/12/2024 More examples  Editinga photo while recording a TV show through a digital video recorder  Downloading software while running an anti-virus program  “Anything that can be threaded today will map efficiently to multi-core”  BUT: some applications difficult to parallelize
  • 18.
    18 3/12/2024 Simultaneous multithreading (SMT) Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core  Weaving together multiple “threads” on the same core  Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units
  • 19.
    19 3/12/2024 BTB and I-TLB Decoder TraceCache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Without SMT, only a single thread can run at any given time
  • 20.
    20 3/12/2024 Without SMT, onlya single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
  • 21.
    21 3/12/2024 SMT processor: boththreads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
  • 22.
    22 3/12/2024 But: Can’t simultaneouslyuse the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
  • 23.
    23 3/12/2024 SMT not a“true” parallel processor  Enables better threading (e.g. up to 30%)  OS and applications perceive each simultaneous thread as a separate “virtual processor”  The chip has only a single copy of each resource  Compare to multi-core: each core has its own copy of resources
  • 24.
    24 3/12/2024 Multi-core: threads can runon separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 3
  • 25.
    25 3/12/2024 BTB and I-TLB Decoder TraceCache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2 Thread 4 Multi-core: threads can run on separate cores
  • 26.
    26 3/12/2024 Combining Multi-core andSMT  Cores can be SMT-enabled (or not)  The different combinations: Single-core, non-SMT: standard uniprocessor Single-core, with SMT Multi-core, non-SMT Multi-core, with SMT:  The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads  Intel calls them “hyper-threads”
  • 27.
    27 3/12/2024 SMT Dual-core: allfour threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 Thread 3 Thread 4
  • 28.
    28 3/12/2024 Comparison: multi-core vsSMT  Multi-core: Since there are several cores, each is smaller and not as powerful (but also easier to design and manufacture) However, great with thread-level parallelism  SMT Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism
  • 29.
    29 3/12/2024 The memory hierarchy If simultaneous multithreading only: all caches shared  Multi-core chips: L1 caches private L2 caches private in some architectures and shared in others  Memory is always shared
  • 30.
    30 3/12/2024  Dual-core Intel Xeonprocessors  Each core is hyper-threaded  Private L1 caches  Shared L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 hyper-threads
  • 31.
    31 3/12/2024 Designs with privateL2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
  • 32.
  • 33.
  • 34.
    34 3/12/2024 Advantages  Cache coherencycircuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip  Signals between different CPUs travel shorter distances, those signals degrade less  These higher quality signals allow more data to be sent in a given time period since individual signals can be shorter and do not need to be repeated as often  A dual-core processor uses slightly less power than two coupled single-core processors
  • 35.
    35 3/12/2024 Disadvantages  Ability ofmulti-core processors to increase application performance depends on the use of multiple threads within applications.  Most Current video games will run faster on a 3 GHz single-core processor than on a 2GHz dual-core processor (of the same core architecture  Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage.  If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement  If memory bandwidth is not a problem, a 90% improvement can be expected
  • 36.
    36 3/12/2024 Conclusion  Multi-core chipsan important new trend in computer architecture  Several new multi-core chips in design phases  Parallel programming techniques likely to gain importance
  • 37.