High performance energy efficient multicore embedded computing

HIGH-PERFORMANCE ENERGY EFFICIENT MULTICORE EMBEDDED COMPUTING 1 Ankit Talele 130913014 M.TECH(CSE) 01/10/2013DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ( IEEE TRANSACTIONS ON PARALLEL & DISTRIBUTED SYSTEMS, VOL.23, NO.4, APRIL 2012) Arslan Munir, Sanjay Ranka and Ann Gordon-Ross

2 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING CONTENT SR NO NAME 1 Introduction 2 Embedded Applications 3 Architectural Approaches 4 Hardware-Assisted Middleware Approaches 5 Software Approaches 6 High-Performance Energy-Efficient Multicore Processors 7 Conclusions, Challenges and Future Research Directions 8 References

1.INTRODUCTION 3 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING  EMBEDDED system design is traditionally power centric but there has been a recent shift toward high performance embedded computing (HPEC) due to the increasing requirement of computer- intensive embedded applications.  The high-performance energy-efficient multicore embedded computing(HPEEC) domain addresses the unique design challenges of high-performance and low-power/energy embedded computing.

4 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING  It is a challenging task as high performance typically requires maximum processor speeds with enormous energy consumption, whereas low power typically requires nominal or low-processor speeds that offer modest performance.  To meet HPEEC power-performance requirements, embedded system design has transitioned from a single- core to a multicore paradigm that favors multiple low- power cores running at low-processors speeds rather than a single high speed power-hungry core.

5 Sr Supercomputing Applications Embedded Applications 1 Performance is the most significant Metric. Energy Efficiency is primary concern in HPEEC. 2 Large number of Processors Comparatively less number of processors 3 Each subset of a task is executed on different processor. Each Task executed on single processor. Table 1 Comparison between Supercomputer Applications and Embedded Applications Fig 1 Green Supercomputer and an Embedded System

6 Fig 1.1 A Multi-Core Processor DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

7 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING 2. Embedded Applications Fig 2.1 Characteristics of Embedded Applications The proliferation of embedded systems is in various domains. Such as consumer electronics, automotive, industrial automation, networking, medical, defense, space, etc. Different embedded applications have different characteristics.

 2.1 Throughput-Intensive  Throughput-intensive embedded applications are applications that require high-processing throughput.  Networking and multimedia applications.  2.2 Thermal Constraints  An embedded application is thermal-constrained if an increase in temperature above a threshold could lead to incorrect results or even the embedded system failure.  Fan based Applications. 8 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

 2.3 Reliability-Constrained  Embedded systems with high reliability constraints are typically required to operate for many years without errors and/or must recover from errors since many reliability-constrained embedded systems are deployed in harsh environments where post deployment removal and maintenance is infeasible.  Space Missions, Aircraft Flight Controllers Applications.  2.4 Parallel & Distributed  Parallel and distributed embedded applications leverage distributed embedded devices to cooperate and aggregate their functionalities or resources.  Wireless sensor network (WSN) Applications.

Architectural approaches can be broadly categorized into four categories : 10 3 Architectural Approaches: Fig 3.1 Architectural Approaches in Embedded Systems

In this section, we discuss various core layout techniques encompassing chip and processor design since high performance cannot be achieved only from semiconductor technology advancements.  3.1.1 Heterogeneous CMP  Heterogeneous CMPs consist of multiple cores of varying size, performance, and complexity on a single die(Semiconductor Material) using ILP& TLP.  Heterogeneous CMPs can provide performance gains as high as 40 percent but at the expense of additional customization cost. 11 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING 3.1 Core Layout

12 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING o 3.1.2 Conjoined-Core CMP Conjoined-core CMPs are multiprocessors that allow topologically feasible resource sharing. For Instance One core may use the shared resource during even cycles and the other core may use the shared resource during odd cycles, or one core may share the resource for the first five cycles, another core for the next five cycles, and so on. Results indicate that conjoined core CMPs can reduce area requirements by 50 percent.

3.2 Memory Design The cache miss rate, fetch latency, and data transfer bandwidth are some of the main factors impacting the performance and energy consumption of embedded systems.  3.2.1 Transactional Memory  Transactional memory incorporates the definition of a transaction (a sequence of instructions executed by a single process with the following properties : atomicity, consistency, and isolation) in parallel programming to achieve lock-free synchronization efficiency by coordinating concurrent threads. 13 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

14 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING  Transactional memory benefits from hardware support that ranges from complete execution of transactions in hardware to hardware-accelerated software implementations of transactional memory.  3.2.2 Cache Partitioning  Used generally in Real-Time Applications. Partitioning of cache into levels (e.g., level two (L2) or level three (L3) caches, interconnect networks). Fig 3.2 Multi-Level Cache Partitioning

 Cache partitioning can enhance performance by assigning larger portions of shared caches to cores with higher workloads as compared to the cores with lighter workloads.  3.2.3 Smart Caching  Smart caching focuses on energy-efficient computing and leverages cache set (way) prediction and low-power cache design techniques.  Way prediction enables faster average cache access time and reduces power consumption because only the predicted way is accessed if the prediction is correct.  Result in a cache static and dynamic energy reduction of 50-75 percent.

 3.3 Interconnection Network:  As the number of on-chip cores increases, a scalable and high-bandwidth interconnection network to connect on-chip resources becomes crucial.  Interconnection networks can be static or dynamic.  3.3.1 Interconnection Topology  The interconnect topology governs the number of hops or routers a message must traverse as well as the interconnection length.  The interconnect topology determines the communication latency and energy dissipation. 16 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

17 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING The interconnect topology cost is dictated by the node degree and the length of the interconnecting wire. Fig 3.3 Graph with Node Degree Representation

18 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING  3.3.2 Wireless Interconnect  Wireless interconnect is an emerging technology that promises to provide high bandwidth, low latency, and low- energy dissipation by eliminating lengthy wired interconnects.  Carbon nanotubes(CNT) are used for Wireless Interconnection.  Experiments indicate that a wireless interconnect can reduce the communication latency by 20-45 percent.

19 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING  3.4 Reduction Technique:  Embedded system architectural design must consider power dissipation reduction techniques.  3.4.1 Leakage Current Reduction  Leakage current reduction techniques include back biasing, silicon on insulator technologies, multithreshold MOS transistors, and power gating.  3.4.2 Short Circuit Current Reduction  The short circuit current can be reduced using low level design techniques that aim to reduce the time during which both nMOSFET and pMOSFET are on.

 4 Hardware-Assisted Middleware Approaches: Hardware-Assisted Middleware approaches can be broadly categorized into Three categories : 20 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Fig 4.1 Hardware Approaches in Embedded Systems

 4.1 Threading Techniques:  Different threading techniques target high performance by enabling a single processor to execute multiple threads.  4.1.1 Hyper threading  Hyper threading leverages simultaneous multithreading to enable a single processor to appear as two logical processors and allows instructions from both of the logical processors to execute simultaneously on the shared resources. 21 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

22 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Fig 4.1.1 Hyper threading Technology Hyper threading enables the OS to schedule multiple threads to the processor so that different threads can use the idle execution units.

23 Fig 4.1.2 Speculative Threading o4.1.2 Speculative threading Speculative threading approaches provide high performance by removing unnecessary serialization in programs. Speculative multithreading divides a sequential program into multiple contiguous program segments called tasks and execute these tasks in parallel on multiple cores.

24 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING  4.2 Gating Techniques:  4.2.1 Power Gating  Power gating is a power management technique that reduces leakage power by switching off the supply voltage to idle logic elements after detecting no activity for a certain period of time.  4.2.2 Pre-Core Power Gating  Per-core power gating is a fine grained power gating technique that individually switches off idle cores.  4.2.3 Split Power Plans  Minimizes both static and dynamic power dissipation by removing power from idle portions of the chip.

 5. Software Approaches  Software approaches enable high performance by scheduling and migrating tasks statically or dynamically to meet application requirements.  5.1 Task Scheduling  The task scheduling problem can be defined as determining an optimal assignment of tasks to cores that minimizes the power consumption while maintaining the chip temperature below the DTM enforced ceiling temperature with minimal or no performance degradation given the total energy budget. 25 Fig 5 Software Approaches in Embedded System

 5.2 Task Migration  In a multithreaded environment, threads periodically can enter and leave cores. Thread migration is a DPM and DTM technique that allows a scheduled thread to execute, preempt, or migrate to another core based on the thread’s thermal or power profile.  Thread migration techniques can be characterized as rotation-based, temperature-based or power-based.  5.3 Load Balancing  Load balancing techniques distribute a workload equally across all the cores in a multicore embedded system.  Load unbalancing can be caused by either extrinsic or intrinsic factors. 26 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

 6. High Performance Energy-Efficienty Multicore Processors  Some of the models of HPEE were Developed those are: 1. Tilera TILEpro64 and TILE-Gx 2. Intel Xeon Processor 3. Graphics Processing Units. 27 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Table 3 High-Performance Energy Efficient Multicore Processors

29 7. Conclusions, Challenges & Future Research  HPEEC is an active and expanding research domain with applications ranging from consumer electronics to supercomputers.  The introduction of HPEEC into supercomputing has boosted the significance of the HPEEC domain as power is becoming a concern for modern supercomputing considering the long- term operation and cooling costs. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

30 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING Table 3 High-Performance Energy Efficient Computing Challenges

31 7. References  [1] W. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. Harting, V.Parikh, J. Park, and D. Sheffield, “Efficient Embedded Computing,” Computer, vol. 41, no. 7, pp. 27-32, July 2008.  [2] J. Balfour, “Efficient Embedded Computing,” PhD thesis, EE Dept., Stanford Univ., May 2010.  [3] P. Gepner, D. Fraser, M. Kowalik, and R. Tylman, “New Multi- Core Intel Xeon Processors Help Design Energy Efficient Solution for High Performance Computing,” Proc. Int’l MultiConf. Computer Science and Information Technology (IMCSIT), Oct. 2009.  [4] P. Crowley, M. Franklin, J. Buhler, and R. Chamberlain, “Impact of CMP Design on High-Performance Embedded Computing,” Proc. High Performance Embedded Computing (HPEC) Workshop, Sept. 2006. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

 [5] Top500, “Top 500 Supercomputer Sites,” http://www.top500. org/, June 2011.  [6] Green500, “Ranking the World’s Most Energy- Efficient Supercomputers,” http://www.green500.org/, June 2011.  [7] K. Hwang, “Advanced Parallel Processing with Supercomputer Architectures,” Proc. IEEE, vol. 75, no. 10, pp. 1348-1379, Oct. 1987.  [8] A. Klietz, A. Malevsky, and K. Chin-Purcell, “Mix-and- Match High Performance Computing,” IEEE Potentials, vol. 13, no. 3, pp. 6-10, Aug./Sept. 1994.  And Many More. 32 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

33 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

High performance energy efficient multicore embedded computing

More Related Content

What's hot

Viewers also liked

Similar to High performance energy efficient multicore embedded computing

Recently uploaded

High performance energy efficient multicore embedded computing