0

The server I am running has 4 vCPU's, 14GB of RAM, 250GB Storage. I get on a 1 minute interval average over a working day (7 hours) around 400-500 requests per minute.

I have increased the php-fpm settings to a max children of 40, start servers 10, min spare 10 and max spare 30. This is then configured to be on-demand.

Using atop -C I can see that most php-fpm processes are consuming 1-2% CPU. Most of the time 2/3 processes start to consume 66-89% CPU (pretty constantly).

When this becomes 4/5+ processes, the CPU goes to 350%^ and this is only under a stress of around 400-500 requests in a minute.

Is there a way I can indentify what is causing the php-fpm processes to consume such high CPU? I am using pgbouncer to pool SQL connections. It is a Laravel project and the SQL queries are not the best since it uses Active Record design patterns but to utilise over 80% CPU (extreemly regularly) over multiple processes tells me something is seriously unoptimised or wrong but from the nginx access log, I can't see anything that should be causing it during the peak CPU spikes - looks like casual requests, no exports etc..

Here is an example of what I see below, at least 3 processes are typically over 60% CPU all the time, request metrics are around 200 during this captured time so below average! Nginx access log shows nothing out the ordinary.

Example of full process list

Any help would be apppreciated or ideas on how I can debug this further. I am waiting to get the App Insights feature into production to attempt to isolate the issue using Azure App Insights.

The php processes are likely spawned from the supervisorctl service as 5 threads run for the queue workers.

I am using filesystem for sessions and after more research I can see in the nginx error log the same as the SO question: https://stackoverflow.com/questions/35891537/laravel-maximum-execution-time-of-30-seconds-exceeded-in-vendor-symfony-finder

Output of vmstat -w 5 for close to 2 minutes:

--procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu-------- r b swpd free buff cache si so bi bo in cs us sy id wa st 3 1 60868 302760 1291416 8474048 0 0 3 94 3 2 7 9 83 1 0 3 0 60868 305652 1291420 8474100 0 0 0 184 3032 11293 22 40 37 0 0 6 0 60868 307456 1291420 8474152 0 0 0 102 3043 12776 26 40 33 0 0 1 0 60868 309116 1291420 8474436 0 0 0 235 2904 13905 26 39 34 1 0 2 0 60868 301588 1291428 8476964 0 0 0 1245 2935 15069 30 40 29 1 0 3 1 60868 281128 1291432 8490560 0 0 0 196 2627 13892 31 40 29 1 0 2 0 60868 303012 1291432 8484872 0 0 0 159 3759 13004 21 37 40 2 0 9 0 60868 301080 1291436 8479384 0 0 0 151 3281 12728 24 41 35 0 0 5 0 60868 313884 1291436 8478624 0 0 0 94 2999 13096 27 39 33 0 0 2 0 60868 310636 1291444 8477260 0 0 0 70 3204 11980 24 36 40 0 0 2 0 60868 309108 1291448 8477488 0 0 0 1542 2982 11453 20 29 50 0 0 1 0 60868 305752 1291448 8477648 0 0 0 78 3036 9900 18 28 54 0 0 2 0 60868 303772 1291448 8477816 0 0 0 133 2408 8044 14 18 67 0 0 3 0 60868 295540 1291448 8477884 0 0 0 158 2482 8559 16 22 61 0 0 1 0 60868 295684 1291452 8477888 0 0 0 105 2985 10358 20 28 52 0 0 0 0 60868 295620 1291460 8477884 0 0 0 42 2352 7613 13 17 69 0 0 1 0 60868 296772 1291464 8478040 0 0 0 134 2751 10899 14 19 67 0 0 1 0 60868 279304 1291468 8495716 0 0 0 534 2801 9469 15 20 63 2 0 3 0 60868 304536 1291472 8485952 0 0 0 59 3420 11658 17 22 59 2 0 1 0 60868 297360 1291476 8480584 2 0 2 97 3079 11568 22 30 48 0 0 1 0 60868 327652 1291476 8479708 0 0 0 220 2777 8600 16 27 56 1 0 

I am unsure if these are accurate since the ASP uses "Virtual Disks" in Azure (as it is a docker container).

1
  • What's the question? PHP will cause CPU load when you give it tasks to do. If You want to avoid CPU load? Use caching. There's no other way. Good technique is employing Varnish Cache Commented Dec 20, 2023 at 15:33

2 Answers 2

0

Edit: It seems you use postgres, the equivalent of the command suggested above is this one :

postgres=# select * from pg_stat_activity; 

If you suspect SQL to be overloaded and unoptimized request to take too long, you may be able to see it using this query as root on myqsl :

SHOW FULL PROCESSLIST; 

Looking at the time column you can identify long request or even hanged ones.

3
  • Thanks! I'm using an Azure App Service (ASP) and my PostgreSQL instance is another resource, there shouldn't be any pgSQL services running on here other than PHP connecting outbound via the PDO driver to run queries. The pgSQL runs at around 30% CPU and 25% Memory on another machine and the PDO driver connects to a PgBouncer pooler to allow cached connections for speed Commented Dec 20, 2023 at 13:59
  • I updated my question to include the full screenshot of all the processes running on the ASP rather than it cropped. My major concern is how the php-fpm processes are exceeding 70% CPU consistently under only a load of around 400 requests per minute as it has caused other processes like supervisorctl to crash Commented Dec 20, 2023 at 14:01
  • I mean, the high CPU load could be caused by some heavy or malformed request, my answer suggest to look after these request (if it is so...) and maybe point out the consuming process or component. IMHO there is no direct relationship between load observed on pgsql side and the one on php-fpm. A quick win should be to switch your governor from "dynamic"(if it's so) to ondemande, it could be surprising but the second one gives more performance. Commented Dec 20, 2023 at 15:05
0

If "top" is showing php-fpm as using most CPU, then there's probably very little load coming from the DBMS (and subsequent comments confirmed this is elsewhere).

Looking at the stats here (while I'm not familiar with the version of top you are using) there seems to be very high memory usage even though pages aren't swapping much. You really need to reduce that. "around 400-500 requests per minute" is not a huge volume - particularly given the size of this server, but the capacity issue (which is causing the performance issue then getting in a feeedback loop) is about the number of requests being handled at any one time (arrival rate x response time).

Next on the list is checking that your opcache is configured and sized correctly.

If these don't resolve your issue then you'll need to start profiling the code. There is a cost to running the data capture for a profiler. You would be advised to split off a small proportion of the traffic to a second host and only run the profiler there.

4
  • Thank-you! Opcache is configured and I checked using the opcache_get_status() which yields a tonne of file lists that've been cached. The memory one I'm not sure is accurate in atop as when I run free, I see this: i.imgur.com/oHwTrPp.png which suggests I have 9GB memory available. I can run run a profiler on a second deployment slot but it seems this is something happening under high load (which I agree, is not that many concurrent requests) Commented Dec 21, 2023 at 9:20
  • You need to check how full opcache is / how often it is getting recycled Commented Dec 21, 2023 at 12:45
  • How would one go about achieving this? The above output seems like a lot of the cached files have a 0 hit but some do have a considerable amount of "hits" in its output Commented Dec 22, 2023 at 8:59
  • From used_memory, free_memory and last_restart_time returned by opcache_get_status Commented Dec 22, 2023 at 9:21

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.