I have developed a web application using Laravel and PostgreSQL as the database.
The main function of my application is to process large volumes of data from the database, with tables ranging from a few thousand records to 13 million in some cases. Essentially, it is a data analytics application where I have several workers simultaneously processing the same task. The number of workers varies between 1 (minimum) and 4 (maximum).
Observing the processes both on my local development machine with Docker and in production, the most demanding ones are the PostgreSQL workers, as my application’s workers remain at around 5% CPU usage, while the PostgreSQL workers reach approximately 80% CPU usage.
It is clear that the main bottleneck in my service is PostgreSQL since many queries are executed, some of them quite complex. Additionally, it is important to note that, in principle, both my application’s code and the SQL queries are correctly optimized.
The problem began when I deployed the service on a VPS with 4 vCores of CPU, 4 GB of RAM, and 4 GB of swap. It is a VPS with KVM virtualization running Ubuntu 24.04.
Locally, I use this project with Docker in two Ubuntu 22.04 containers for Laravel and PostgreSQL. When I run the workers of my service’s main process with the default configuration in the Docker containers, the processing speed is acceptable. However, when I run the same processes in production, the speed is 2 to 4 times slower than in the local environment.
For example, if each worker in my application can process 1,000 elements every 5 seconds locally, in production, it processes only 200 to 300 elements in the same timeframe.
My local machine is a MacBook Air M1 with 16 GB of RAM. The issue is that I don’t see RAM usage as a real limiting factor in production since the memory is not overloaded on the production machine. I believe the real bottleneck is CPU performance because, even using Docker virtualization on my local machine, it is still faster than my production server.
In production I ran the benchamark of CPU with sysbench --test=cpu run
, and his result was:
CPU speed: events per second: 995.85 General statistics: total time: 10.0007s total number of events: 9961 Latency (ms): min: 0.94 avg: 1.00 max: 2.02 95th percentile: 1.25 sum: 9990.51 Threads fairness: events (avg/stddev): 9961.0000/0.00 execution time (avg/stddev): 9.9905/0.00
My main question is: What could be causing the production performance to be significantly worse than on my development machine using Docker?
Could it be that the VPS hosting does not meet the CPU and disk performance expectations I had, despite being a production machine that, in principle, should have enough capacity to run my service?
Could it be due to a misconfiguration of PostgreSQL in production? I have already tested multiple different configurations, increasing parallelization and allowing more resource usage by PostgreSQL, but I have not achieved significant improvements.
I am open to any questions if further information is needed to evaluate the case.
Thank you.