I'm facing high load spikes on a linux server (Ubuntu 18.04, 16 core, 8GB Ram) : it is a webserver with apache 2.4, php7.2-fpm and memcached, no database services (which are provided by different server).
Those spikes are very fast and lasts for few seconds, this is what top shows
At first I thought it depends on abnormal CPU usage but, as shown in the picture, there is no process who made high use of cpu and RAM usage is relative low (25%). Instead the problem seems to be wa which are very high!
I've tried to investigate if it depends on I/O storage problem using iotop but it seems that when those spikes happened there is no increases of I/O reads or writes.
Those spikes happened many times in a day and, after some monitoring, I saw that they happened al specific time. I checked if exists processes scheduled to run at that specific time but haven't found anything.
I know that other high load causes could also depends by high bandwith usage but how can I monitor this?
I'm not very experienced with iostat o sar, are they "the right way" to troubleshooting my problem?
As Matthew Ife suggested I've tried with ps -ALo pid,tid,comm,wchan in high load moment and it gives some interesting output:
35227 35227 kworker/12:2 worker_thread 53730 53730 kworker/7:0 worker_thread 57306 57306 php-fpm7.2 poll_schedule_timeout 57348 57348 php-fpm7.2 poll_schedule_timeout 57988 57988 php-fpm7.2 poll_schedule_timeout 58251 58251 php-fpm7.2 poll_schedule_timeout 60181 60181 kworker/5:2 worker_thread 62158 62158 kworker/2:1 worker_thread 62169 62169 php-fpm7.2 poll_schedule_timeout 65001 65001 php-fpm7.2 poll_schedule_timeout 69262 69262 php-fpm7.2 poll_schedule_timeout 69647 69647 kworker/6:1 worker_thread 72110 72110 php-fpm7.2 call_rwsem_down_write_failed 72638 72638 php-fpm7.2 skb_wait_for_more_packets 72845 72845 php-fpm7.2 call_rwsem_down_write_failed 72848 72848 php-fpm7.2 call_rwsem_down_write_failed 72850 72850 php-fpm7.2 skb_wait_for_more_packets 72892 72892 php-fpm7.2 skb_wait_for_more_packets 72909 72909 kworker/u256:2 worker_thread 72940 72940 kworker/u256:4 get_write_access 73353 73353 top poll_schedule_timeout 73367 73367 php-fpm7.2 locks_lock_inode_wait 73659 73659 php-fpm7.2 call_rwsem_down_write_failed 73950 73950 php-fpm7.2 skb_wait_for_more_packets 73953 73953 php-fpm7.2 call_rwsem_down_write_failed 74259 74259 php-fpm7.2 skb_wait_for_more_packets 74345 74345 php-fpm7.2 skb_wait_for_more_packets 74436 74436 kworker/13:1 worker_thread 74481 74481 kworker/u256:1 get_write_access 74519 74519 php-fpm7.2 skb_wait_for_more_packets 74522 74522 php-fpm7.2 skb_wait_for_more_packets 74576 74576 php-fpm7.2 call_rwsem_down_write_failed 74578 74578 php-fpm7.2 skb_wait_for_more_packets 74603 74603 php-fpm7.2 locks_lock_inode_wait 74849 74849 php-fpm7.2 skb_wait_for_more_packets 75085 75085 php-fpm7.2 skb_wait_for_more_packets 75088 75088 php-fpm7.2 call_rwsem_down_write_failed 75100 75100 php-fpm7.2 call_rwsem_down_write_failed 75171 75171 php-fpm7.2 skb_wait_for_more_packets 75283 75283 kworker/2:2 worker_thread .... Does this mean that there's some problem with filesystem or storage?
ps -ALo pid,tid,comm,wchanand provide the output. If you're using NFS it may relate to that.topand add a filter withotoS=Dto see these stuck processes.