2

Im running a service on a GCP VM that has, randomly, very low CPU performance. Commands like grep and tail on text files often take up to a minute to start showing otherwise instant outputs.

This has happened about 5 or 6 times over the last month, and causes severe instability on our end, that does impact customers. I cannot overstate the gravity of this matter.

Some details that may be relevant:

  • Machine is a C2D HighCPU 16 instance type (AMD EPYC Milan)
  • Machine is located on europe-west1-b
  • Image is a Ubuntu 18 32 bit image. It has no services, but I do need a 32 bit image
  • Some times (especially during these issues), we have a single core with 100% usage, but no process showing up as consuming it. I have attached screenshots

Normal usage

Usage spike - core that has spike changes every few seconds. No processes have more than 25% cpu usage on htop, but one core has 100%.

Has anyone ever had this issue? Does anyone have any suggestions on how I could go about solving this?

Very much appreciated

8
  • 1
    Have you gone through these troubleshooting steps also can you share the details like the name of process that consuming more resources and details about your disk size and it's consumption. Commented Jan 8, 2024 at 6:27
  • Part 1: Google does not provide 32-bit images for Compute Engine. I am not aware of any testing for 32-bit images. I am assuming that you imported an image into Google Cloud. Therefore, your Linux kernel must have certain kernel build options enabled. See this link and verify that each required setting is correct. Edit your post to include how your Linux kernel was built. Otherwise, you have an unsupported environment and that can cause all kinds of weird behavior. Commented Jan 8, 2024 at 6:35
  • Part 2: 32-bit versions require swap space. The c2d-standard-16 has 64 GB of memory. A 32-bit system cannot normally access that much memory. How much memory does your Linux system show and how much is used/free? Your screenshot shows 195 MB swap space used which means you are over allocating memory. At a certain point, the system runs low on memory and begins to thrash swapping processes to swap space. We will need more details to debug this problem. However, your problem might not be solvable. Commented Jan 8, 2024 at 6:41
  • @JohnHanley Part 1: each setting has those default values. gVNIC was enabled, as its required on c2d aswell. Part 2: Its not c2d-standard-16, its c2d-highcpu-16, which has 32 GB memory (which is correctly shown on htop). No process is using 100% CPU during those spikes, according to htop. But then the system becomes completely unusable during those spikes, greps dont work, tails dont work... Simple commands take forever to work, and then my services running on VM crash. Any hints? Commented Jan 8, 2024 at 12:25
  • Example: I just did an apt-get update where it took 3 minutes to read package listing. I've never seen this slow of a performance, and cpu usage is at 35% right now Commented Jan 8, 2024 at 14:28

1 Answer 1

0

Running htop showing kernel threads allowed me to see that kswapd0 was using all CPU, which caused the server hog and the unresponsive terminal.

Then I ran echo vm.swappiness=0 | sudo tee -a /etc/sysctl.conf to disallow swap and then echo 1 | sudo tee /proc/sys/vm/drop_caches to drop what kswapd0 was doing.

This fixed the issue for now. I will continue to investigate why swap was even called since I'm using 14/32 GB of RAM, but otherwise issue is fixed.

5
  • You have not fixed the problem. When the system now runs out of memory, it will crash instead of swap. Commented Jan 8, 2024 at 18:26
  • I see your point, but the system should not run out of memory at all. I'm at a constant 14/32 GB RAM usage. I'm not sure why swap was even happening. Do you have any hints on how I could go around investigating that? Commented Jan 8, 2024 at 18:59
  • The system is using 195 MB of swap now. Your assumptions are not correct. The fact that it should not be is not important, it is using swap space and that is the important detail. Commented Jan 8, 2024 at 19:00
  • @JohnHanley thanks for the perspective. Is there any way I can check which process trigerred swap? And when? Commented Jan 9, 2024 at 14:01
  • Read /proc/meminfo. There are tools such as smem that will also help you. Commented Jan 10, 2024 at 3:58

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.