The document summarizes several industry standard benchmarks for measuring database and application server performance including SPECjAppServer2004, EAStress2004, TPC-E, and TPC-H. It discusses PostgreSQL's performance on these benchmarks and key configuration parameters used. There is room for improvement in PostgreSQL's performance on TPC-E, while SPECjAppServer2004 and EAStress2004 show good performance. TPC-H performance requires further optimization of indexes and query plans.
About Me • Workingwith Sun Microsystems for about 7 1/2 years > Primarily responsibility at Sun is to make ISV and Open Source Community Software applications work better on Solaris • Prior to Sun worked as ERP Consultant • Worked with various databases (DB2 UDB, PostgreSQL, MySQL, Progress OpenEdge, Oracle) • Worked with various ERP (QAD, Lawson) and CRM (Dispatch-1), etc • Previous responsibilities also included : Low Cost BIDW
SPECjAppServer2004 • SPECjAppServer2004 isthe current version • Review by SPEC required before publishing the result (published on spec.org) • Metric is JOPS = jAppServer Operations Per Second • Fine workload to use to measure impacts of database from one version to another (rather than compare systems, operating systems and/or other databases)
6.
SPECjAppServer2004 Characteristics • J2EEApplication with Database Backend • Response times do depend on Database Performance among other things • Not a micro benchmark for Database but not exhaustive also • Typical Single row queries/updates/inserts • No stored procedures • Mostly highlighting performance combining J2EE and database performance together
PostgreSQL's SPECjAppServer2004 Performance • Twopublished SPECjAppServer2004 result using Glassfish and PostgreSQL 8.2 on Solaris > 778.14 JOPS with Glassfish v1 > 813.73 JOPS with Glassfish v2 • PostgreSQL is in top category in terms of overall low price and price/performance Mandatory Disclosure: SPECjAppServer2004 JOPS@standard Sun Fire X4200 M2 (4 chips, 8 cores) - 813.73 SPECjAppServer2004 JOPS@Standard Sun Fire X4200 M2 (6 chips, 12cores) - 778.14 SPECjAppServer2004 JOPS@Standard SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. All results from www.spec.org as of Jan 8,2008
EAStress2004 • EAStress2004 isRESEARCH mode of SPECjAppServer2004 • No review from SPEC required • Metric of EAStress2004 (HASOPM) is not equivalent and hence should not be compared to metric of SpecJAppServer2004 (JOPS) • Fine workload to use to measure impacts of database from one version to another (rather than compare systems, operating systems and/or other databases)
12.
EAStress2004 Characteristics • Inlot of ways subset to SPECjAppServer2004 but not equivalent as SPECjAppserver2004 has more added workload tasks • Has potential to be put into regression test suite for PostgreSQL • Stresses IO, Scalability, Response times
13.
PostgreSQL's EAStress2004 Performance EAStress2004 HASOPM– Hundreds of Application Server Operations Per Minute SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. PostgreSQL 8.2 (32-bit) PostgreSQL 8.3 (64-bit) 0 100 200 300 400 500 600 700 EAStress2004 with PostgreSQL EAStress Metric (HASOPM) 46% improvement just by changing the database underneath it Highlights database performance impact to EAStress Differences between 8.3/8.2: • 64-bit vs 32-bit • sync_commit=false • Higher shared_buffers **Missing data point with 8.3 (32-bit) which could have been very helpful
TPC-E Highlights ● Complexschema ● Referential Integrity ● Less partitionable ● Increase # of trans ● Transaction Frames ● Non-primary key access to data ● Data access requirements (RAID) ● Complex transaction queries ● Extensive foreign key relationships ● TPC provided core components
17.
TPC-E Sample Setup SystemUnder Test Driver Tier A Tier B Data Data Data Database Server App. Server App. Server App. Server Mandatory Network between Driver and Tier A Network System Under Test Driver Tier A Tier B DataData DataData DataData Database ServerDatabase Server App. ServerApp. Server App. ServerApp. Server App. ServerApp. Server Mandatory Network between Driver and Tier A Network Image From: http://www.tpc.org/tpce/spec/TPCEpresentation.ppt
18.
TPC-E Characteristics • BrokerageHouse workload • Scale factor in terms of active customers to be used dependent on target performance (roughly Every 1K customer = 7.1GB raw data to be loaded) • Lots of Constraints and Foreign keys • Business logic (part of system) can be implemented via Stored Procedures or other mechanisms • Can be used to stress multiple features of database: Random IO reads/writes, Index performance, stored procedure performance, response times, etc
19.
How PostgreSQL isbehaving right now with TPC-E? • Setup process very slow with PostgreSQL • Table with few rows hot for update (Broker) • High Random reads which blocks (trade and trade_history) • Adding index hurts trade update performance and less index hurts trade lookup performance • More contention if client streams are increased even slightly resulting in drop in performance
20.
How PostgreSQL isbehaving right now with TPC-E? • With some work, it could be possible to publish a competitive TPC-E with PostgreSQL
TPC-H • Industry StandardTPC Benchmark • Data Warehousing / Decision Support • Simulates ad hoc environment where there is little pre-knowledge of the queries • Simple Schema > 8 Tables > 3NF, not Star
23.
TPC-H • Different scalefactors: 100GB, 300GB, 1000GB, 3000GB • 22 queries • 2 refresh functions (insert, delete) • Single-stream component . . . power • Multi-stream component . . . throughput • Ad-hoc enforced by implemention rules > Indexes only on primary key, foreign key and date colums.
24.
How PostgreSQL Behaves •Power run actually runs a single stream of queries > Since PostgreSQL can only use one core for query, it is difficult to use the capabilities of multi-core systems. • For research purposes, its useful to see how PostgreSQL performs even in single stream
25.
How PostgreSQL Behaves •Current runs indicate that without right index(es) it is hard for PostgreSQL Optimizer to suggest good plans. > However index on such huge tables are slow to create, plus you can never guess the next index required (in realworld BIDW) > COPY took 02:12:06 while INDEX creations took 11:33:47 > Commercial databases have figured good ways to just live with few index for this type of workload • Range Partitioning, Table Partitioning, Clustering are more important > Hard to provide single logical view of partitioned table for inserts/updates. Plus very hard to setup table partitioning which can be compliant with run rules
26.
How PostgreSQL Behaves •Query profiles without range-partitioning or Clustering but with many indexes: > Queries which are user CPU(core) bound = 1,7,8,12,13,15,19,21 > Queries which are user+sys CPU (core bound)= 2,3,11,15,18 > Queries which are suspiciously idle = 9,17, 20, 22 > Queries return 0 rows immediately = 4, 5, 6,10,14
27.
Summary/Next Step • Goodoverall status with SPECjAppServer2004 and EAStress • EAStress good load for regression testing • TPC-E with PostgreSQL has room for improvements. > Highlights hot contention with BROKER table > Need to work with community to see if it is a schema problem or some inherent problem in PostgreSQL • TPC-H with PostgreSQL will require more detailed investigation > Figure out problems with broken queries > Optimizer plan key to performance > Need to work with community
28.
Acknowledgements • Performance andBenchmark Team, Sun > Vince Carbone (TPC-H) > Glenn Fawcett (TPC-E) > John Fowler Jr • ISV- Engineering, Sun > Tom Daly (SpecJAppServer / EAStress )
29.
More Information • PostgreSQLQuestion: <postgresql-question@sun.com> • Blogs on PostgreSQL > Josh Berkus: http://blogs.ittoolbox.com/database/soup > Jignesh Shah: http://blogs.sun.com/jkshah/ > Tom Daly: http://blogs.sun.com/tomdaly/ > Robert Lor: http://blogs.sun.com/robertlor/ • PostgreSQL on Solaris Wiki: http://wikis.sun.com/display/DBonSolaris/PostgreSQL • OpenSolaris databases community: databases-discuss@opensolaris.org
TPC-E Scaling Design ●DBMS size and metric scales with the number of emulated customers in the database ● Transactions designed for consistent scaling; independent of architecture ● Transactions designed to access “any row, any where”. Increases cross-node & cross schema communications. ● “Any customer emulation” - Any driver can emulate any customer at any time, and possibly the same customer simultaneously across drivers. ● All results are comparable
TPC-E Transaction Overview ●Broker Volume – Total potential volume for a subset of brokers of all Trades in a given sector for a specific customer tier – Single Frame ● Customer Position – Reports the current market value for each account of a customer – Single Frame ● Security Detail – Returns all information pertaining to a specific security; financial, news, stock performance ... - Single Frame ● Trade Status – Status of the most recent trade for a customer – Single Frame ● Market Watch – Calculates the percentage change in value of the market capitalization for a set of securities – Multiple Independent Single Frames
35.
TPC-E Transaction Overview– Con't ● Trade Lookup – Return all information relating to a specific trade determined by either: 1) trade-id, or 2) customer-id and a timestamp – Multiple Independent Frames ● Trade-Update – Same as Trade-Lookup, but modifies the data returned, i.e. “Settle cash transactions” - Multiple Independent Frames ● Trade Order – Request to buy/sell a quantity of a security for a customer account either via a market or limit order – Single Multi Frame Transaction ● Trade Result – The completion of a confirmed Trade Order from the “Market” - Single Multi Frame Transaction ● Market Feed – Update the last traded values for a security from the “ticker” (Market Exchange Emulator) – Single Multi Frame Transaction
36.
TPC-E Reported Metrics ●Primary Metrics ● tpsE : qualified throughput metric; total number of Trade-Result transactions completed in the measurement interval divided by the measurement interval in seconds ● $/tpsE : Total 3 year cost divided by the throughput metric ● Additional Reported Metric ● # of processors, cores and threads ● Durability Redundancy Level ● Database Recovery Time
37.
TPC-H Reporting Requirements ●Scale factor, e.g., @1000GB ● Composite performance metric QphH ● Price/performance . . . $/ QphH ● System availability date ● Results at different scale factors are not comparable . . . per TPC
38.
TPC-H Reported Metric ●Primary Metrics ● Composite Metric (QphH@size) ● Composite of Power and Throughput metric ● Price/Performance Metric ($/QphH@size) ● Secondary Metrics ● Power Numerical Quantity (QppH@size) ● How fast a single stream of queries perform ● Throughput Numerical Quantity(QthH@size) ● How fast multiple stream of queries perform