I have a long-running CLI cron job written in PHP. After an inconsistent period of time, the script stops writing output to its log file, indicating that it has stopped executing.
However, the script is still running as confirmed with ps aux
.
The script isn't throwing any exceptions or outputting any errors. When it stalls, it doesn't seem to consume any more CPU.
Memory isn't an issue - the script uses ini_set("memory_limit", "1G");
and seems to freeze when the script is only using about 73 MB. The script is running on a capable dedicated machine (8 GB RAM, 8 CPUs, PHP 5.6.25) with plenty of free memory, disk space, inodes, /tmp space, available CPU, etc.
Using xdebug's function trace logging, I can see that the script simply stops at an unremarkable function call. A "normal" execution of xdebug writes something like TRACE END [2019-03-26 13:02:55]
to the end of the logfile and then exits. However, when the script freezes, this line does not appear and the script does not exit. For example:
"Stalled" xdebug function log, Example A:
1481.7659 73187512 -> in_array() /home/xyz/inc/util/email.inc:156 1481.7659 73187056 -> strpos() /home/xyz/inc/util/email.inc:170 1481.7659 73187208 -> mail() /home/xyz/inc/util/email.inc:181
The "stall" does not always happen at the same function.
"Stalled" xdebug function log, Example B:
1422.5710 73306784 -> strrpos() /home/xyz/inc/util/external-common.inc:106 1422.5710 73306840 -> substr() /home/xyz/inc/util/external-common.inc:108 1422.5710 73307008 -> shell_exec() /home/xyz/inc/util/external-common.inc:112
"Normal" xdebug function log:
55190.4859 35183552 -> WP_Object_Cache->__destruct() /home/xyz/public_html/wp/wp-includes/cache.php:0 55190.8278 34830224 -> wpdb->__destruct() /home/xyz/public_html/wp/wp-includes/wp-db.php:0 55193.1823 8672 TRACE END [2019-03-26 13:02:55]
This might be mere coincidence, but the only common thread that I can see is that it might have something to do with C's popen()
- it (or a relative) seems to be used by both PHP's mail()
and shell_exec()
. I don't think I have too many open processes (verified recently; don't have the command I used on-hand). The script only executes at most 5 shell_exec()
calls per second and each call takes less than a second to execute. (Plus, if this were a problem, wouldn't I receive some sort of error instead of this "stalling" behavior?)
What could be going on?
What should my next steps be?