I've been encountering an issue that has been bothering me for a while now.
I have a Google Cloud VM running an Ubuntu 18.04 server with several containers, including a PostgreSQL server, a Redis Server, and a couple of my own applications. These are all managed through a docker-compose setup.
In PostgreSQL, there are a couple of tables where INSERTS occur frequently throughout the day, with DELETES happening only at night.
Periodically, I've noticed significant CPU and memory spikes lasting 2-4 minutes, which impact the performance of my applications. Initially, I assumed it was caused by my application, as I observed high activity in both my application and PostgreSQL using basic monitoring commands such as top or htop during these spikes.
After investing time in optimizing my software, it's now performing much better, and the load on the database is more evenly distributed over time, resulting in lower overall CPU usage.
However, I still encounter these sudden CPU and memory spikes, albeit without affecting my optimized software and database as significantly. During these spikes, I observe:
- Low CPU/MEMORY usage by any of my containers (verified through docker stats or monitoring tools like top, htop, and glances)
- A rapid increase in Ubuntu memory and CPU usage
- No specific process or line in monitoring tools indicating the cause of the behavior
Additionally, sporadically, there's a kworker process consuming a small percentage of CPU, but with negligible memory usage, which doesn't seem to be directly related to these spikes.
I would appreciate any assistance in identifying the cause of this behavior and determining if there's a way to monitor it using specific commands without relying solely on overall memory/CPU usage. I can't identify which component is at fault here, sadly.
I'm attaching an example of the CPU behavior / MEMORY behavior, IO doesn't really seem to affect this after all selects are really lightweight and INSERTS are done at a constant rate during the day. Also, I can confirm there's no ongoing VACUUM. In the first chart, in green, total available memory. In the second chart, CPU usage.
Thank you
Link to stats image: https://i.sstatic.net/Ub1XT.png
top
when a spike happens.top
output.