2

I'm setting up a new PG13 cluster, on ubuntu 20.04, virtual machine. The performance I'm seeing, via pgbench, is neither good nor bad. What I'm trying to work out is why it is what it is, and not higher, by trying to find where it's being capped.

By running pgbench I had assumed that I would see an obvious bottleneck:

  • in CPU. And yet htop shows none of the cores totally maxed. Load is around or less than 2 on a 4 core machine.
  • in disk. Running other benchmarks, I can see the disks can run way faster than what's happening under pgbench. The numbers are reasonable. Doubling the disk speed (iSCSI) makes no difference.
  • in network. But iftop shows all the numbers are reasonable. Going direct via local socket instead of TCP does speed things up (about double tps). But the bottleneck at that point is still not cpu or disk.

htop iotop

Above shows an experiment going via 127.0.0.1 (to ensure network stack is involved). Iotop (if I understand it correctly) is showing that <1% of the process's time is blocked on IO.

Default pgbench params, but 30 clients:

pgbench -h 127.0.0.1 -U myuser mydb -T 50 -c 30 

But they're all blocked - waiting - during the update statement. Is there something obvious I'm missing here? What other tools can I run to understand the issue better?

EDIT to add later pgbench output

nik@pgdb2:~$ sudo -u postgres pgbench -s 100 -h pgdb1 -p 5433 -c 30 -T 5 -j 2 Password: scale option ignored, using count from pgbench_branches table (100) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 30 number of threads: 2 duration: 5 s number of transactions actually processed: 9814 latency average = 15.372 ms tps = 1951.612278 (including connections establishing) tps = 1955.965884 (excluding connections establishing) 
2
  • 1
    What's the scale factor? If using the default (1), use at least 100 instead. See postgresql.org/docs/current/pgbench.html . Also please show the results displayed by pgbench at the end. Commented Mar 3, 2021 at 1:42
  • Many thanks Daniel. With scale factor 100, I'm seeing about 10x difference in TPS. I also discovered the -j argument though, and setting that to the number of cores made an improvement But at scale factor 500 it's no good - so clearly a sweet spot there. The CPUs are still not maxed at scale factor 100. I am seeing a lot of activity from blkid though in iotop - where it's blocked > 90% of the time on IO. I guess this is now proving that the IO is where it's deadlocked? Commented Mar 4, 2021 at 10:16

1 Answer 1

1

Have you tuned your PostgreSQL configuration?

By default, Postgres is tuned for wide compatibility rather than performance.

The Wiki page linked above is a good resource to start with, as is this tool, and there is also a whole list of other resources on the Wiki.

There are also good resources from EDB and Percona who are two of the big names in commercial PostgreSQL deployment.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.