© 2013 EDB All rights reserved. 1 Implementing Parallelism in PostgreSQL • Robert Haas | PGCon 2014
© 2014 EDB All rights reserved. 2 • Between 1996 and 2004, single-threaded CPU performance on SPECint and SPECfp benchmarks increased by >50% per year. Between 2004 and 2012, it increased by ~21% per year. − http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/ • Single-threaded 7-zip performance was only 39% faster on 2 x Intel Xeon L5640 (March 16, 2010; 2.7 GHz, 12 MB cache) than on 4 x AMD Opteron 880 (September 26, 2005; 2.4 GHz, 2MB cache). That's only 7.7% per year. − http://www.anandtech.com/show/6825/inside-anandtech-2013-cpu-performance Parallelism: Why? (1)
© 2014 EDB All rights reserved. 3 • Dell Configuration Tool (as of 2014-05-15): − 2x Intel® Xeon® E7-4890 v2 Processor 2.8GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 15 Core, 155W [add $7,735.68] − 2x Intel® Xeon® E7-8893 v2 Processor 3.4GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 6 Core, 155W [add $8,410.30] Parallelism: Why? (2)
© 2014 EDB All rights reserved. 4 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • One backend could run the Seq Scan and apply the filter condition; it could then stream the results to another backend to perform the Hash Join. Parallel Query: Inter-Node
© 2014 EDB All rights reserved. 5 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • Multiple backends could cooperate to perform Seq Scan – or Hash Join. Parallel Query: Intra-Node
© 2014 EDB All rights reserved. 6 • CREATE INDEX − Parallel Heap Scan − Parallel Sort • VACUUM − Parallel Heap Scan − Worker Per Index (suggestion from Andres and Heikki) Parallel Maintenance / DDL Commands
© 2014 EDB All rights reserved. 7 • Processes – Not Threads − None of our fundamental subsystems are thread-safe (e.g. palloc/pfree, ereport, syscache, relcache, buffer manager). − Making them thread-safe would add synchronization overhead even in the single-threaded case – and also bugs. • Started By Postmaster – Not created via fork() − Can't fork() on Windows, where many of our users are. − Currently, all backends are direct children of the postmaster; seems preferable to keep it that way. Architectural Overview (1)
© 2014 EDB All rights reserved. 8 • Shared Memory – Not Pipes or Files − Files would cause more system calls and more I/O. − Pipes are a good paradigm, but shared memory is more flexible. − We can use shared memory to emulate a pipe if we need to – see shm_mq. (This also dodges platform dependencies.) • Dynamic Shared Memory – Not Main Segment − For an application such as parallel sort, we might need a LOT of memory, like a terabyte. We can't pre-reserve that! • Dynamic Shared Memory Could Be At a Different Address in Every Process − No good, general techniques for achieving this. Architectural Overview (2)
© 2014 EDB All rights reserved. 9 • Basic Facilities (done in 9.4) • Plumbing (some work done/in progress) • Parallel Environment (a little unpublished work done) • Parallel Execution (some study/thought) • Parallel Planning (no idea yet) What Do We Need To Build?
© 2014 EDB All rights reserved. 10 • Dynamic Background Workers (done in 9.4) • Dynamic Shared Memory (done in 9.4) Basic Facilities
© 2014 EDB All rights reserved. 11 • DSM Table of Contents (done in 9.4) − I just mapped this dynamic shared memory segment; how do I figure out what it contains? • Message Queueing (done in 9.4) − How does a background worker send tuples, errors, notices, etc. to a user backend? • Error Propagation (working on it) − Common infrastructure to make using message queueing easy. • Shared Memory Allocator (early draft posted) • Shared Hash Table (someday) Plumbing
© 2014 EDB All rights reserved. 12 • Make the Background Worker Look Enough Like a Regular User Backend To Do Useful Work − Copy Relevant State (e.g. User, Database, Snapshot) • Useful Work Doesn't Mean Everything − Some operations seem fundamentally unsafe in a parallel context (e.g. calling a user-defined function that sets a GUC). − Some operations could theoretically be made safe, but we might not bother (e.g. setseed() + random()). − Even if we share lots of state, arbitrary user-supplied code can never be safe; must label unsafe functions. Parallel Environment
© 2014 EDB All rights reserved. 13 • User ID and Database • GUCs • Transaction State • Current and Active Snapshot • Combo CID Hash Parallel Environment: What To Copy
© 2014 EDB All rights reserved. 14 • Sequence Operations • Generation of Invalidation Messages • Cursor Operations • Large Object Manipulation • LISTEN/NOTIFY • Access to Temporary Buffers • Prepared Statements Parallel Environment: What To Prohibit
© 2014 EDB All rights reserved. 15 • Background Workers Can't Rely on User Backend To Hold Necessary Locks − The user backend might die or be killed before the background worker terminates. • If Background Workers Re-Lock The Same Relations, Parallel Query Might Self Deadlock − User backend locks X; another process queues for a conflicting lock on X; background worker tries to re-lock X. • Probably Need a Concept of Locking Groups Inside the Lock Manager Parallel Environment: Lock Management
© 2014 EDB All rights reserved. 16 • This is the “easy” part. • Parallel sorting algorithms are described in the literature and well-understood. • For parallel sequential scan, grab blocks or block ranges in alternation. • Amdahl's Law: If α is the fraction of running time a program spends executing serially, the maximum speedup from parallelism is 1/α. Parallel Execution
© 2014 EDB All rights reserved. 17 • This is probably hard. • Right now, we do costing based on estimating the page access costs (CPU and I/O) and tuple processing costs. • For parallelism, need to consider worker startup costs and IPC costs. • A plan that's a little cheaper for me might be much more expensive in total. Parallel Query Planning
© 2014 EDB All rights reserved. 18 • Any questions? Thanks.

Implementing Parallelism in PostgreSQL - PGCon 2014

  • 1.
    © 2013 EDBAll rights reserved. 1 Implementing Parallelism in PostgreSQL • Robert Haas | PGCon 2014
  • 2.
    © 2014 EDBAll rights reserved. 2 • Between 1996 and 2004, single-threaded CPU performance on SPECint and SPECfp benchmarks increased by >50% per year. Between 2004 and 2012, it increased by ~21% per year. − http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/ • Single-threaded 7-zip performance was only 39% faster on 2 x Intel Xeon L5640 (March 16, 2010; 2.7 GHz, 12 MB cache) than on 4 x AMD Opteron 880 (September 26, 2005; 2.4 GHz, 2MB cache). That's only 7.7% per year. − http://www.anandtech.com/show/6825/inside-anandtech-2013-cpu-performance Parallelism: Why? (1)
  • 3.
    © 2014 EDBAll rights reserved. 3 • Dell Configuration Tool (as of 2014-05-15): − 2x Intel® Xeon® E7-4890 v2 Processor 2.8GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 15 Core, 155W [add $7,735.68] − 2x Intel® Xeon® E7-8893 v2 Processor 3.4GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 6 Core, 155W [add $8,410.30] Parallelism: Why? (2)
  • 4.
    © 2014 EDBAll rights reserved. 4 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • One backend could run the Seq Scan and apply the filter condition; it could then stream the results to another backend to perform the Hash Join. Parallel Query: Inter-Node
  • 5.
    © 2014 EDBAll rights reserved. 5 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • Multiple backends could cooperate to perform Seq Scan – or Hash Join. Parallel Query: Intra-Node
  • 6.
    © 2014 EDBAll rights reserved. 6 • CREATE INDEX − Parallel Heap Scan − Parallel Sort • VACUUM − Parallel Heap Scan − Worker Per Index (suggestion from Andres and Heikki) Parallel Maintenance / DDL Commands
  • 7.
    © 2014 EDBAll rights reserved. 7 • Processes – Not Threads − None of our fundamental subsystems are thread-safe (e.g. palloc/pfree, ereport, syscache, relcache, buffer manager). − Making them thread-safe would add synchronization overhead even in the single-threaded case – and also bugs. • Started By Postmaster – Not created via fork() − Can't fork() on Windows, where many of our users are. − Currently, all backends are direct children of the postmaster; seems preferable to keep it that way. Architectural Overview (1)
  • 8.
    © 2014 EDBAll rights reserved. 8 • Shared Memory – Not Pipes or Files − Files would cause more system calls and more I/O. − Pipes are a good paradigm, but shared memory is more flexible. − We can use shared memory to emulate a pipe if we need to – see shm_mq. (This also dodges platform dependencies.) • Dynamic Shared Memory – Not Main Segment − For an application such as parallel sort, we might need a LOT of memory, like a terabyte. We can't pre-reserve that! • Dynamic Shared Memory Could Be At a Different Address in Every Process − No good, general techniques for achieving this. Architectural Overview (2)
  • 9.
    © 2014 EDBAll rights reserved. 9 • Basic Facilities (done in 9.4) • Plumbing (some work done/in progress) • Parallel Environment (a little unpublished work done) • Parallel Execution (some study/thought) • Parallel Planning (no idea yet) What Do We Need To Build?
  • 10.
    © 2014 EDBAll rights reserved. 10 • Dynamic Background Workers (done in 9.4) • Dynamic Shared Memory (done in 9.4) Basic Facilities
  • 11.
    © 2014 EDBAll rights reserved. 11 • DSM Table of Contents (done in 9.4) − I just mapped this dynamic shared memory segment; how do I figure out what it contains? • Message Queueing (done in 9.4) − How does a background worker send tuples, errors, notices, etc. to a user backend? • Error Propagation (working on it) − Common infrastructure to make using message queueing easy. • Shared Memory Allocator (early draft posted) • Shared Hash Table (someday) Plumbing
  • 12.
    © 2014 EDBAll rights reserved. 12 • Make the Background Worker Look Enough Like a Regular User Backend To Do Useful Work − Copy Relevant State (e.g. User, Database, Snapshot) • Useful Work Doesn't Mean Everything − Some operations seem fundamentally unsafe in a parallel context (e.g. calling a user-defined function that sets a GUC). − Some operations could theoretically be made safe, but we might not bother (e.g. setseed() + random()). − Even if we share lots of state, arbitrary user-supplied code can never be safe; must label unsafe functions. Parallel Environment
  • 13.
    © 2014 EDBAll rights reserved. 13 • User ID and Database • GUCs • Transaction State • Current and Active Snapshot • Combo CID Hash Parallel Environment: What To Copy
  • 14.
    © 2014 EDBAll rights reserved. 14 • Sequence Operations • Generation of Invalidation Messages • Cursor Operations • Large Object Manipulation • LISTEN/NOTIFY • Access to Temporary Buffers • Prepared Statements Parallel Environment: What To Prohibit
  • 15.
    © 2014 EDBAll rights reserved. 15 • Background Workers Can't Rely on User Backend To Hold Necessary Locks − The user backend might die or be killed before the background worker terminates. • If Background Workers Re-Lock The Same Relations, Parallel Query Might Self Deadlock − User backend locks X; another process queues for a conflicting lock on X; background worker tries to re-lock X. • Probably Need a Concept of Locking Groups Inside the Lock Manager Parallel Environment: Lock Management
  • 16.
    © 2014 EDBAll rights reserved. 16 • This is the “easy” part. • Parallel sorting algorithms are described in the literature and well-understood. • For parallel sequential scan, grab blocks or block ranges in alternation. • Amdahl's Law: If α is the fraction of running time a program spends executing serially, the maximum speedup from parallelism is 1/α. Parallel Execution
  • 17.
    © 2014 EDBAll rights reserved. 17 • This is probably hard. • Right now, we do costing based on estimating the page access costs (CPU and I/O) and tuple processing costs. • For parallelism, need to consider worker startup costs and IPC costs. • A plan that's a little cheaper for me might be much more expensive in total. Parallel Query Planning
  • 18.
    © 2014 EDBAll rights reserved. 18 • Any questions? Thanks.