We've got an Apache server that almost daily becomes unresponsive. By checking /server-status (mod_status) we can see that we've got 60 child processes that are all in a "W" (Sending Reply) state.
- If we run
service httpd restarteverything goes back to normal and the problem goes away for a day or so. - If we instead of restarting Apache kill every single child process, the problem remains (this is the only way for us to access /server-status which responds until all processes get to a "W" state).
- To me it seems that our PHP scripts never finish when the problem starts happening, which made me think it was a MySQL, Solr, or PHP/Apache timeout problem.
- However...
- Solr/MySQL respond instantly.
- There are plenty of MySQL connections available (we use AWS-RDS, the max connections allowed is greater than the number of Apache processes).
- RAM is still fine (each process is 10m x 60 = 600Mb RAM, there's till plenty free).
- PHP has
max_exectution_timeset to "30". - Apache
TimeOutis set to "60". - We don't use persistent MySQL connections.
- We do use
curl_setopt($conn, CURLOPT_FORBID_REUSE, 0)to query Solr (I'm hoping this gets garbage collected properly by curl if the connection goes away).
- It still seems though that many processes never finish... I left a process running while killing all the other processes and this process stayed alive for 2 hours, still serving the exact same page (I could see this in /server-status) that normally take 50ms to respond.
- We don't use
set_time_limit(0)or anything silly like that in our code. - I assume that omitting
set_time_limitmeans the scripts will finish aftermax_execution_time.
I had a theory that Apache's ListenBacklog as set too high and that whenever we killed the processes 60 new ones were instantly started, all trying to respond to clients that had long gone away. This would explain why the problem went away when we restart the server. But it seems ListenBacklog wasn't set and hence the default "511" would be in use. I tried killing all child processes several times in a row to flush the backlog, but the problem remains... all new requests to PHP pages take forever to respond (most don't respond).
PHP config:
max_execution_time = 30 max_input_time = 60 safe_mode = off Apache config:
KeepAlive off <IfModule prefork.c> StartServers 8 MinSpareServers 5 MaxSpareServers 20 ServerLimit 256 MaxClients 60 MaxRequestsPerChild 1000 </IfModule> I've run out of ideas... Any hints would be greatly appreciated!