Elastic Beanstalk Log Rotation Causing Apache Restart

Question

I have already worked through AWS Elastic Beanstalk - Apache is restarting constantly

Our elastic beanstalk instances are reporting the following message in the error_log

[Mon Jun 26 22:01:01.878892 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received. Attempting to restart *** Error in (wsgi:wsgi) ': double free or corruption (out): 0x00007f564cced560 ***

sometimes the error sequence will look more like this:

[Tue Jun 27 00:01:01.215260 2017] [:error] [pid 6429] [remote XX.XXX.XX.195:29773] mod_wsgi (pid=6429): Exception occurred processing WSGI script '/opt/python/current/app/site/settings/wsgi/__init__.py'. [Tue Jun 27 00:01:01.215320 2017] [:error] [pid 6429] [remote XX.XXX.XX.195:29773] OSError: failed to write data [Tue Jun 27 00:01:01.222407 2017] [:error] [pid 6430] [remote XX.XXX.XX.60:53313] mod_wsgi (pid=6430): Exception occurred processing WSGI script '/opt/python/current/app/site/settings/wsgi/__init__.py'. [Tue Jun 27 00:01:01.222460 2017] [:error] [pid 6430] [remote XX.XXX.XX.60:53313] OSError: failed to write data [Tue Jun 27 00:01:04.554810 2017] [core:warn] [pid 8595] AH00045: child process 7614 still did not exit, sending a SIGTERM [Tue Jun 27 00:01:04.554850 2017] [core:warn] [pid 8595] AH00045: child process 7615 still did not exit, sending a SIGTERM [Tue Jun 27 00:01:05.555958 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received. Attempting to restart *** Error in (wsgi:wsgi) ': double free or corruption (out): 0x00007f5640cae900 *** *** Error in (wsgi:wsgi) ': double free or corruption (out): 0x00007f78649b7970 ***

This will go on, almost every hour. The common message being:

[Mon Jun 26 22:01:01.878892 2017] [mpm_prefork:notice] [pid 8595] AH00173: SIGHUP received. Attempting to restart

I looked for the mpm_prefork module conf block... and there is not one, so all the defaults are being used.

I looked for the logrotation command being executed by elastic beanstalk

/var/log/httpd/* { size 10M missingok notifempty rotate 5 sharedscripts compress dateext dateformat -%s create postrotate /sbin/service httpd reload > /dev/null 2>/dev/null || true endscript olddir /var/log/httpd/rotated }

Pretty standard stuff. My understanding of reload is it attempts a graceful restart...

I am able to manually trigger the error message by executing sudo apachectl -k restart although I can not find where this would be run during the log rotation.

We have downstream services which appear to be throwing exceptions at the point this server hangs up all it's connections.

So my question is, what else could be causing the SIGHUP within the mpm_prefork during logrotate? As far as I can tell, this should not be happening outside of an error condition.

Apache/2.4.18 (Amazon) mod_wsgi/3.5 Python/3.4.3

I would "guess" it is an external monitor checking the logs for the kind of exception errors you see in the error.log. Do all the restarts happen within 5 minutes of the exceptions you see? — Daniel Ferradal
– Daniel Ferradal, Commented Jun 27, 2017 at 8:57
@ezra-s do you mean something other than the existing hourly logrotate? There is nothing else unless I am missing something? — samazi
– samazi, Commented Jun 27, 2017 at 10:08
yes, I meant something other, but your description matches the case where something else monitors logs and restarts if it find certain messages. It was just a hint, AFAIK apache won't restart itself ever, so.. it must be something else, external to it. — Daniel Ferradal
– Daniel Ferradal, Commented Jun 27, 2017 at 15:40

saaj · Accepted Answer · 2017-12-14 13:17:11Z

As a brief, it looks like current Elastic Beanstalk logrotation configuration is broken, which causes service downtime, 504 Gateway Timeout. Let's take a look.

Reproduction

We create simplest Python WSGI application.

application.py

import time def application(environ, start_response): # somewhat realistic response duration time.sleep(0.5) status = '200 OK' response_headers = [('Content-type', 'text/plain')] start_response(status, response_headers) return [b'Hello world!\n']

Zip it to application.zip. Then create Elastic Beanstalk Python application and environment, upload the archive. Make sure you use a key pair that you possess. Leave other settings default. Wait until it's done (several minutes).

ssh into underlying EC2 instance (see instance identifier in EB's log). Type (httpd's logrotate post-action, see below):

sudo /sbin/service httpd reload

Then on your machine run:

siege -v -b -c 10 -t 10S http://your-test-eb.you-aws-region.elasticbeanstalk.com/

And while it runs, repeat reload command a couple of times.

Then you're expected to see something like the following:

** SIEGE 3.0.8 ** Preparing 10 concurrent users for battle. The server is now under siege... HTTP/1.1 200 0.63 secs: 13 bytes ==> GET / HTTP/1.1 200 0.65 secs: 13 bytes ==> GET / HTTP/1.1 200 0.64 secs: 13 bytes ==> GET / HTTP/1.1 200 0.60 secs: 13 bytes ==> GET / ...

Here's what happens when you reload.

HTTP/1.1 504 0.06 secs: 0 bytes ==> GET / HTTP/1.1 504 0.07 secs: 0 bytes ==> GET / HTTP/1.1 504 0.08 secs: 0 bytes ==> GET / HTTP/1.1 504 0.10 secs: 0 bytes ==> GET / HTTP/1.1 504 0.11 secs: 0 bytes ==> GET / HTTP/1.1 504 0.66 secs: 0 bytes ==> GET / HTTP/1.1 504 0.19 secs: 0 bytes ==> GET / HTTP/1.1 504 0.20 secs: 0 bytes ==> GET / HTTP/1.1 504 0.09 secs: 0 bytes ==> GET /

Then it recovers.

HTTP/1.1 200 1.25 secs: 13 bytes ==> GET / HTTP/1.1 200 1.24 secs: 13 bytes ==> GET / HTTP/1.1 200 1.26 secs: 13 bytes ==> GET / ... Lifting the server siege.. done. Transactions: 75 hits Availability: 81.52 % Elapsed time: 9.40 secs Data transferred: 0.00 MB Response time: 1.21 secs Transaction rate: 7.98 trans/sec Throughput: 0.00 MB/sec Concurrency: 9.68 Successful transactions: 75 Failed transactions: 17 Longest transaction: 4.27 Shortest transaction: 0.06

Note that ELB doesn't seem to have any effect on the problem, and the same can be reproduced with two SSH sessions to underlying EC2 and (Amazon AMI doesn't have siege):

ab -v 4 -c 10 -t 10 http://your-test-eb.you-aws-region.elasticbeanstalk.com/

Cause

/etc/cron.hourly/cron.logrotate.elasticbeanstalk.httpd.conf

#!/bin/sh test -x /usr/sbin/logrotate || exit 0 /usr/sbin/logrotate /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf

/var/log/httpd/* { size 10M missingok notifempty rotate 5 sharedscripts compress dateext dateformat -%s create postrotate /sbin/service httpd reload > /dev/null 2>/dev/null || true endscript olddir /var/log/httpd/rotated }

Notice postrotate. /sbin/service is just a System V wrapper for scripts in /etc/init.d/. Its man page says:

service runs a System V init script in as predictable environment as possible, removing most environment variables and with current working directory set to /.

Note that reload is not standard Apache maintenance command. It's the distro's downstream addition. Let's look in the init script, /etc/init.d/httpd. Relevant part follows:

reload() { echo -n $"Reloading $prog: " check13 || exit 1 killproc -p ${pidfile} $httpd -HUP RETVAL=$? echo }

As you can see it sends HUP signal to Apache, which is interpreted as Restart Now:

Sending the HUP or restart signal to the parent causes it to kill off its children like in TERM, but the parent doesn't exit. It re-reads its configuration files, and re-opens any log files. Then it spawns a new set of children and continues serving hits.

TERM explains 504s pretty well. But how it should have probably been done is Graceful Restart, as it also re-opens logs but doesn't terminate requests being served:

The USR1 or graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they're not serving anything). The parent re-reads its configuration files and re-opens its log files. As each child dies off the parent replaces it with a child from the new generation of the configuration, which begins serving new requests immediately.

...

The code was written to both minimize the time in which the server is unable to serve new requests (they will be queued up by the operating system, so they're not lost in any event) and to respect your tuning parameters.

Workaround

It's possible to use .ebextensions to replace /etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf. In the root directory create .ebextensions/10_logs.config with the following contents (basically replace "reload" with "graceful"):

files: "/etc/logrotate.elasticbeanstalk.hourly/logrotate.elasticbeanstalk.httpd.conf": mode: "000644" owner: root group: root content: | /var/log/httpd/* { size 10M missingok notifempty rotate 5 sharedscripts compress dateext dateformat -%s create postrotate /sbin/service httpd graceful > /dev/null 2>/dev/null || true endscript olddir /var/log/httpd/rotated }

And re-deploy your Elastic Beanstalk environment. Note, however with subsequent sub-second graceful restarts I was able to (sporadically) produce 503 Service Unavailable, which, though, is not the case with log rotations as with evenly spaced graceful restarts there was no error.

I had the same problem, but as my application takes some time to restart, I have about a minute of downtime. Is there a good practice to avoir this downtime due to logrotate? — Jundiaius
– Jundiaius, Commented Jun 18, 2018 at 14:03
@Jundiaius You can try to rewrite your logrotate configuration to use copytruncate instead of create. This way you don't need to restart your application in any way, but with a small chance for loosing several lines on rotation. — saaj
– saaj, Commented Jun 18, 2018 at 14:09

Stack Exchange Network

Elastic Beanstalk Log Rotation Causing Apache Restart

1 Answer 1

Reproduction

Cause

Workaround

You must log in to answer this question.

Linked

Hot Network Questions

Elastic Beanstalk Log Rotation Causing Apache Restart

1 Answer 1

Reproduction

Cause

Workaround

You must log in to answer this question.

Linked

Related

Hot Network Questions