MultiCore Programming 2 Presented by: Robin Aggarwal
Agenda • Concurrent Collections • Synchronization Primitives • Lazy Initailization Classes • Locks • Memory Allocations & Performance • Debugging ToolsPermormance Watch • Supporting Demos • DosDonts
Concurrent Collections • ConcurrentQueue • ConcurrentStack • ConcurrentDictionary • BlockingCollection • ConcurrentBag • NameSpace – System.Collections.Concurrent Note: Concurrent collections are thread-safe and optimized for concurrent access from multiple threads.
Producer-Consumer Scenarios • Pure Scenario – Equal No. of producers and consumers • Mixed Scenario – Threads both produce and consume data.
BlockingCollection
ConcurrentQueue – Pure Producer Consumer scenario
ConcurrentQueue – Mixed Producer Consumer scenario
ConcurrentStack – Pure Producer Consumer scenario
ConcurrentStack – Mixed Producer Consumer scenario
ConcurrentBag
ConcurrentDictionary – Almost ReadOnly Dictionary Frequent Updates
ConcurrentDictionary – Concurrent Reading and Updating
• DO use ConcurrentDictionary instead of a Dictionary with a lock, in particular for dictionaries that are heavily accessed from multiple threads, especially if most of the accesses are reads. Reads of a ConcurrentDictionary ensure thread safety without taking locks. • CONSIDER using BlockingCollection to represent the communication buffer in consumer-producer scenarios. In such scenarios, one or more producer threads insert elements into a BlockingCollection, and one or more consumer threads remove elements from the BlockingCollection. • DO use regular collections with locks instead of concurrent collections if you need to perform compound atomic operations that are not supported by the corresponding concurrent collection.
Synchronization Primitives • For improving the performance of multithreaded applications by enabling fine-grained concurrency and by avoiding expensive locking mechanism. • System.Threading.CountdownEvent • System.Threading.Barrier – provides various methods which allows developers to bring all parallel tasks to a synchronization point. • ManualResetEventSlim • SemaphoreSlim
Task 1 Task 2 Task 3 Some Processing Some other Some Processing Processing Barrier Synchronization PointNo - 1 Special Processing Barrier Synchronization PointNo - 2 Rest of The Rest of The Processing Processing
Task 1 Task 2 Task 3 ContdownEvent ce = new CountdownEvent(2); ce.Wait(); ce.Signal(); Task 1 Blocked (Singnal Count becomes 1) Till Signal Count Becomes Zero I will carry on with my work ce.Signal(); without blocking (Singnal Count becomes 0) Now I can carry on with rest of my work I will carry on with my work without blocking
• CONSIDER using ManualResetEventSlim instead of ManualResetEvent and SemaphoreSlim instead of Semaphore. The “slim” versions of the two coordination structures speed up many common scenarios by spinning for a while before allocating real wait handles.
Lazy Initailization Classes • With Lazy Initialization, memory required for an object is allocated only when it is needed. • By spreading object allocations evenly across the entire lifetime of a program, these can drastically improve performance of the application. • System.Lazy(T) • System.Threading.ThreadLocal(T) • System.Threading.LazyInitializer
• DO make sure that any static methods you implement are thread-safe. By convention, static methods should be thread-safe, and methods in BCL follow this convention. • DO NOT use objects on one thread after they got disposed by another thread. This mistake is easy to introduce when the responsibility for disposing an object is not clearly defined.
Locks • Locks are the most important tool for protecting shared state. A lock provides mutual exclusion – only one thread at a time can be executing code protected by a single lock. • DO NOT use publicly visible objects for locking. If an object is visible to the user, they may use it for their own locking protocol, despite the fact that such usage is not recommended. • CONSIDER using a dedicated lock object instead of reusing another object for locking. • DO NOT hold locks any longer than you have to. • DO NOT call virtual methods while holding a lock. Calling into unknown code while holding a lock poses a deadlock risk because the called code may attempt to acquire other locks. Acquiring locks in an unknown order may result in a deadlock. • DO use locks instead of advanced techniques such as lock-free programming, Interlocked, SpinLock, etc. These advanced techniques are tricky to use correctly and error-prone.
Memory Allocations and Performance • It is generally a good idea to limit memory allocations in high-performance code. Even though the .NET garbage collector (GC) is highly optimized, it can have significant impact on performance of code that spends most of its time allocating memory. • AVOID unnecessarily allocating many small objects in your program. Watch out for boxing, string concatenation and other frequent memory allocations. • CONSIDER opting into server GC for parallel applications. • Background GC Vs Server GC • Server GC maintains multiple heaps, one for each core on the machine. These heaps can be collected in parallel more easily. • <configuration> • <runtime> • <gcServer enabled="true" /> • </runtime> • </configuration>
Caches & Performance • Sometimes parallel program can degrade the use of caches, it may well be slower than a similar sequential program. How? Eg. FalseSharing • Solutions: – DO store values that are frequently modified in stack- allocated variables whenever possible. A thread’s stack is stored in a region of memory only modified by the owner thread. – CONSIDER padding values that are frequently overwritten by different threads.
Debugging and Performance Tools • Parallel Tasks • Parallel Stacks • TaskManager • Concurrency Visualizer

Multi core programming 2

  • 1.
  • 2.
    Agenda • Concurrent Collections • Synchronization Primitives • Lazy Initailization Classes • Locks • Memory Allocations & Performance • Debugging ToolsPermormance Watch • Supporting Demos • DosDonts
  • 3.
    Concurrent Collections • ConcurrentQueue • ConcurrentStack • ConcurrentDictionary • BlockingCollection • ConcurrentBag • NameSpace – System.Collections.Concurrent Note: Concurrent collections are thread-safe and optimized for concurrent access from multiple threads.
  • 4.
    Producer-Consumer Scenarios • PureScenario – Equal No. of producers and consumers • Mixed Scenario – Threads both produce and consume data.
  • 5.
  • 6.
    ConcurrentQueue – PureProducer Consumer scenario
  • 7.
    ConcurrentQueue – MixedProducer Consumer scenario
  • 9.
    ConcurrentStack – PureProducer Consumer scenario
  • 10.
    ConcurrentStack – MixedProducer Consumer scenario
  • 12.
  • 13.
    ConcurrentDictionary – AlmostReadOnly Dictionary Frequent Updates
  • 14.
  • 15.
    • DO useConcurrentDictionary instead of a Dictionary with a lock, in particular for dictionaries that are heavily accessed from multiple threads, especially if most of the accesses are reads. Reads of a ConcurrentDictionary ensure thread safety without taking locks. • CONSIDER using BlockingCollection to represent the communication buffer in consumer-producer scenarios. In such scenarios, one or more producer threads insert elements into a BlockingCollection, and one or more consumer threads remove elements from the BlockingCollection. • DO use regular collections with locks instead of concurrent collections if you need to perform compound atomic operations that are not supported by the corresponding concurrent collection.
  • 16.
    Synchronization Primitives • Forimproving the performance of multithreaded applications by enabling fine-grained concurrency and by avoiding expensive locking mechanism. • System.Threading.CountdownEvent • System.Threading.Barrier – provides various methods which allows developers to bring all parallel tasks to a synchronization point. • ManualResetEventSlim • SemaphoreSlim
  • 17.
    Task 1 Task 2 Task 3 Some Processing Some other Some Processing Processing Barrier Synchronization PointNo - 1 Special Processing Barrier Synchronization PointNo - 2 Rest of The Rest of The Processing Processing
  • 18.
    Task 1 Task 2 Task 3 ContdownEvent ce = new CountdownEvent(2); ce.Wait(); ce.Signal(); Task 1 Blocked (Singnal Count becomes 1) Till Signal Count Becomes Zero I will carry on with my work ce.Signal(); without blocking (Singnal Count becomes 0) Now I can carry on with rest of my work I will carry on with my work without blocking
  • 19.
    • CONSIDER usingManualResetEventSlim instead of ManualResetEvent and SemaphoreSlim instead of Semaphore. The “slim” versions of the two coordination structures speed up many common scenarios by spinning for a while before allocating real wait handles.
  • 20.
    Lazy Initailization Classes •With Lazy Initialization, memory required for an object is allocated only when it is needed. • By spreading object allocations evenly across the entire lifetime of a program, these can drastically improve performance of the application. • System.Lazy(T) • System.Threading.ThreadLocal(T) • System.Threading.LazyInitializer
  • 21.
    • DO makesure that any static methods you implement are thread-safe. By convention, static methods should be thread-safe, and methods in BCL follow this convention. • DO NOT use objects on one thread after they got disposed by another thread. This mistake is easy to introduce when the responsibility for disposing an object is not clearly defined.
  • 22.
    Locks • Locks arethe most important tool for protecting shared state. A lock provides mutual exclusion – only one thread at a time can be executing code protected by a single lock. • DO NOT use publicly visible objects for locking. If an object is visible to the user, they may use it for their own locking protocol, despite the fact that such usage is not recommended. • CONSIDER using a dedicated lock object instead of reusing another object for locking. • DO NOT hold locks any longer than you have to. • DO NOT call virtual methods while holding a lock. Calling into unknown code while holding a lock poses a deadlock risk because the called code may attempt to acquire other locks. Acquiring locks in an unknown order may result in a deadlock. • DO use locks instead of advanced techniques such as lock-free programming, Interlocked, SpinLock, etc. These advanced techniques are tricky to use correctly and error-prone.
  • 23.
    Memory Allocations andPerformance • It is generally a good idea to limit memory allocations in high-performance code. Even though the .NET garbage collector (GC) is highly optimized, it can have significant impact on performance of code that spends most of its time allocating memory. • AVOID unnecessarily allocating many small objects in your program. Watch out for boxing, string concatenation and other frequent memory allocations. • CONSIDER opting into server GC for parallel applications. • Background GC Vs Server GC • Server GC maintains multiple heaps, one for each core on the machine. These heaps can be collected in parallel more easily. • <configuration> • <runtime> • <gcServer enabled="true" /> • </runtime> • </configuration>
  • 24.
    Caches & Performance •Sometimes parallel program can degrade the use of caches, it may well be slower than a similar sequential program. How? Eg. FalseSharing • Solutions: – DO store values that are frequently modified in stack- allocated variables whenever possible. A thread’s stack is stored in a region of memory only modified by the owner thread. – CONSIDER padding values that are frequently overwritten by different threads.
  • 25.
    Debugging and PerformanceTools • Parallel Tasks • Parallel Stacks • TaskManager • Concurrency Visualizer