1

I am doing some Apache and MySQL performance tests, then upgrading a single component and running tests again to see where the bottlenecks are. I have a local used server warehouse (Tams in Lindon UT) that is loaning me about 20 processors tomorrow to do these tests on to compare more cores vs higher clock speeds. Then I will be testing with NVMe RAID vs SSD RAID, vs HDD RAID vs single HDD.

I wrote a script that goes through and combines the stats from various other programs such as iostat, vmstat, mpstat. As I am looking at the results, I am having trouble understanding why there is such a huge slowdown at 512 threads. When looking at the load on the CPU, HDD, Memory using these tools, there doesn't seem to be much of a difference between that and the 256 threads.

For this one I am using Apache Benchmark using the command: ab -kc 500 -n 1000 -l -s 60 http://localhost/stress.php All measurements were taken 5 seconds into the test so that it would be in full swing.

Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 12G 18G 4.5M 32G 49G, NVMe RAID, TestType: Apache PHP, Threads: 16, Server Load: 0.94, 0.28, 0.15, Threads: 16, Time per request: 125.523 [ms] (mean), Concurrency Level: 16, Requests per second 127.47 [#/sec] (mean)Time per request: 125.523 [ms] (mean), Connect: 0 0 0.1 0 1, Processing: 59 124 177.8 121 2924, CPU Core Load: %usr,1.89,1.94,1.98,1.95,1.93,1.92,1.93,1.94,1.93,1.94,1.93,1.95,1.92,1.87,1.88,1.85,1.83,1.85,1.82,1.84,1.82,1.85,1.84,1.85,1.82,, CPU Idle%: %idle,97.66,97.59,97.60,97.60,97.64,97.63,97.64,97.60,97.64,97.59,97.64,97.60,97.65,97.66,97.70,97.69,97.74,97.69,97.75,97.70,97.75,97.70,97.73,97.62,97.75,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 17, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.37, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 12G 17G 4.5M 32G 49G, NVMe RAID, TestType: Apache PHP, Threads: 32, Server Load: 3.98, 0.97, 0.37, Threads: 32, Time per request: 194.680 [ms] (mean), Concurrency Level: 32, Requests per second 164.37 [#/sec] (mean)Time per request: 194.680 [ms] (mean), Connect: 0 0 0.5 0 4, Processing: 65 193 380.9 143 4300, CPU Core Load: %usr,1.89,1.94,1.99,1.95,1.93,1.92,1.94,1.95,1.94,1.94,1.93,1.95,1.93,1.88,1.88,1.85,1.84,1.85,1.82,1.85,1.82,1.85,1.84,1.86,1.82,, CPU Idle%: %idle,97.66,97.58,97.60,97.60,97.64,97.63,97.63,97.60,97.64,97.59,97.64,97.60,97.65,97.66,97.70,97.69,97.74,97.68,97.75,97.70,97.75,97.70,97.72,97.62,97.74,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 32, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.37, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 12G 17G 4.5M 32G 49G, NVMe RAID, TestType: Apache PHP, Threads: 64, Server Load: 7.13, 1.72, 0.62, Threads: 64, Time per request: 384.736 [ms] (mean), Concurrency Level: 64, Requests per second 166.35 [#/sec] (mean)Time per request: 384.736 [ms] (mean), Connect: 0 1 3.0 0 14, Processing: 130 378 547.1 311 5501, CPU Core Load: %usr,1.90,1.95,1.99,1.95,1.93,1.93,1.94,1.95,1.94,1.95,1.94,1.95,1.93,1.88,1.89,1.86,1.84,1.86,1.83,1.85,1.83,1.86,1.84,1.86,1.83,, CPU Idle%: %idle,97.66,97.58,97.59,97.59,97.63,97.62,97.63,97.60,97.63,97.58,97.64,97.59,97.64,97.65,97.69,97.69,97.73,97.68,97.75,97.69,97.75,97.69,97.72,97.61,97.74,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 59, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.37, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 12G 17G 4.5M 32G 49G, NVMe RAID, TestType: Apache PHP, Threads: 128, Server Load: 11.64, 2.82, 0.99, Threads: 128, Time per request: 786.459 [ms] (mean), Concurrency Level: 128, Requests per second 162.75 [#/sec] (mean)Time per request: 786.459 [ms] (mean), Connect: 0 1 4.0 0 14, Processing: 149 742 1175.8 468 6124, CPU Core Load: %usr,1.90,1.95,1.99,1.96,1.94,1.93,1.94,1.95,1.94,1.95,1.94,1.96,1.93,1.88,1.89,1.86,1.84,1.86,1.83,1.85,1.83,1.86,1.85,1.86,1.83,, CPU Idle%: %idle,97.65,97.58,97.59,97.59,97.63,97.62,97.63,97.59,97.63,97.58,97.63,97.59,97.64,97.65,97.69,97.68,97.73,97.68,97.74,97.69,97.74,97.69,97.72,97.61,97.74,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 101, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.37, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 13G 17G 4.5M 32G 48G, NVMe RAID, TestType: Apache PHP, Threads: 256, Server Load: 30.81, 7.33, 2.49, Threads: 256, Time per request: 2309.182 [ms] (mean), Concurrency Level: 256, Requests per second 110.86 [#/sec] (mean)Time per request: 2309.182 [ms] (mean), Connect: 0 3 5.2 0 14, Processing: 148 1546 1985.2 830 9003, CPU Core Load: %usr,1.91,1.95,2.00,1.96,1.94,1.93,1.95,1.96,1.95,1.96,1.94,1.96,1.94,1.89,1.90,1.86,1.85,1.86,1.83,1.86,1.83,1.86,1.85,1.87,1.84,, CPU Idle%: %idle,97.65,97.57,97.59,97.59,97.63,97.62,97.62,97.59,97.62,97.58,97.63,97.58,97.64,97.65,97.69,97.68,97.73,97.67,97.74,97.69,97.74,97.69,97.71,97.61,97.73,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 77, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.37, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 13G 16G 4.5M 32G 48G, NVMe RAID, TestType: Apache PHP, Threads: 512, Server Load: 46.33, 11.29, 3.82, Threads: 512, Time per request: 28000.550 [ms] (mean), Concurrency Level: 512, Requests per second 18.29 [#/sec] (mean)Time per request: 28000.550 [ms] (mean), Connect: 0 11 15.6 0 38, Processing: 179 14189 21674.4 1677 54626, CPU Core Load: %usr,1.91,1.96,2.00,1.96,1.94,1.94,1.95,1.96,1.95,1.96,1.95,1.96,1.94,1.89,1.90,1.87,1.85,1.87,1.84,1.86,1.84,1.87,1.85,1.87,1.84,, CPU Idle%: %idle,97.65,97.57,97.58,97.58,97.62,97.61,97.62,97.59,97.62,97.57,97.63,97.58,97.63,97.64,97.68,97.68,97.72,97.67,97.74,97.68,97.74,97.68,97.71,97.60,97.73,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 0, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.37, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, Cores: 6, Sockets 2, DDR4, (free-mh): 62G 14G 16G 4.5M 32G 48G, NVMe RAID, TestType: Apache PHP, Threads: 750, Server Load: 37.39, 13.46, 4.95, Threads: 750, Time per request: 40777.912 [ms] (mean), Concurrency Level: 750, Requests per second 18.39 [#/sec] (mean)Time per request: 40777.912 [ms] (mean), Connect: 0 13 17.8 0 43, Processing: 150 17962 23352.7 1807 54307, CPU Core Load: %usr,1.91,1.96,2.00,1.97,1.95,1.94,1.95,1.96,1.95,1.96,1.95,1.97,1.94,1.89,1.90,1.87,1.85,1.87,1.84,1.86,1.84,1.87,1.86,1.87,1.84,, CPU Idle%: %idle,97.64,97.57,97.58,97.58,97.62,97.61,97.62,97.58,97.62,97.57,97.62,97.58,97.63,97.64,97.68,97.67,97.72,97.67,97.73,97.68,97.73,97.68,97.71,97.60,97.73,,Waiting on Disk: %iowait,0.04,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.03,0.04,0.03,0.05,0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.04,0.04,0.11,0.04,, vmstatProcs: 0, vmstatSwapIn: 0, vmstatSwapOut: 0, vmstatIoIn: 7, vmstatIoOut: 42, iostatR-S: 1.72, iostatW-S: 13.36, iostatR-MBS: 0.25, iostatW-MBS: 0.52, iostatHDD-Util: 0.00 

The PHP script is doing a json_decode(), sorting thousands of records multiple times, using preg_replace() on some array items then generating random numbers using:

for($z = 0; $z < 10; $z++) { $r = array(0,0,0,0,0,0,0,0,0,0,0); for ($i=0;$i<100000;$i++) { $n = mt_rand(0,10000); if ($n<=10) { $r[$n]++; } } print_r($r); print_r("<BR>"); } 

I am wondering if there are some other commands that I should be looking at the output for to determine this. Current command set:

vmstat -w -S m iostat -dxmh /dev/md0 mpstat -P ALL free -h lscpu top -b -n #Between each test clear cache sync; echo 3 > /proc/sys/vm/drop_caches swapoff -a && swapon -a 
2
  • Additional information request. Any SSD or NVME devices on MySQL any PROD Host server? Post on pastebin.com and share the links. From your SSH login root, Text results of: B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) complete MySQLTuner report AND Optional very helpful information, if available includes - htop OR top for most active apps, ulimit -a for a Linux/Unix list of limits, iostat -xm 5 3 for IOPS by device and core/cpu count, for server workload tuning analysis to provide suggestions. Commented Mar 7, 2020 at 18:25
  • On your mpstat -P ALL if you added 5 3 you would get Summary, 5 seconds later usage for total of 3 cycles of displaying data. This works on some others as well, specifically iostat has the options. Commented Mar 7, 2020 at 18:35

1 Answer 1

2

Good that you measure response time, even for this micro benchmark. Host level utilization metrics do not tell the entire story, definitely not here.

Performance degraded well before 256 threads. Requests per second actually went down after 32 threads, so the third run started getting worse. And without really a corresponding increase in CPU utilization or iowait. So there likely is contention on some other resource.

Keep an open mind and check everything for utilization and saturation. A very incomplete list:

  • It does a random thing, does that exhaust and block /dev/random?
  • Does it use any shared resources concurrently, like flocks or UNIX IPC?
  • If printing variables, where does this output go, and how slow is the I/O?
  • How are the many worker threads manged, in other words what is your Apache MPM and how is it configured?
  • Does this benchmark use a database? DBMS tuning is a topic all by itself, have a look into that.

As this is Linux where the performance tooling is good, consider sampling everything on CPU and look at the call graphs. Reference Gregg's Linux performance notes on how to make on CPU flame graphs. Use eBPF when you can, and perf record on earlier kernels. Read graphs to see where the time is being spent, between your code, run times and database, and the operating system. For example, you might see file system related functions if many of the sampled stacks are related to file I/O.

Also on the topic of looking at all the things, consider getting netdata going. This particular tool gathers a large number of data points with minimal configuration. Just the built in alert set is worth the effort of installing it, in my opinion. netdata also can be pointed at Apache httpd mod_status so you can graph connections and workers alongside host metrics.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.