Opened 11 years ago

Closed 10 years ago

#626 closed defect (fixed)

Open Socket Alerts on Reload

Reported by: Strahinja Kustudić Owned by: Valentin Bartenev <vbart@…>
Priority: minor Milestone:
Component: nginx-core Version: 1.6.x
Keywords: spdy, http2 Cc:
uname -a: Linux web1 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.6.1

Description

I am using Nginx 1.6.1 from the official Nginx repository and recently (I think starting from 1.6.1) we started getting these errors on reload:

2014/09/15 11:00:13 [alert] 15749#0: *418441897 open socket #786 left in connection 17 2014/09/15 11:00:13 [alert] 15749#0: *25679726 open socket #932 left in connection 36 2014/09/15 11:00:13 [alert] 15749#0: *250563297 open socket #108 left in connection 64 2014/09/15 11:00:13 [alert] 15749#0: *255280874 open socket #467 left in connection 108 2014/09/15 11:00:13 [alert] 15749#0: *83647387 open socket #687 left in connection 116 2014/09/15 11:00:13 [alert] 15749#0: *172055940 open socket #582 left in connection 158 2014/09/15 11:00:13 [alert] 15749#0: *42380963 open socket #370 left in connection 162 2014/09/15 11:00:13 [alert] 15749#0: *187857605 open socket #834 left in connection 196 2014/09/15 11:00:13 [alert] 15749#0: *96301018 open socket #290 left in connection 226 2014/09/15 11:00:13 [alert] 15749#0: *229257100 open socket #520 left in connection 232 2014/09/15 11:00:13 [alert] 15749#0: *310210706 open socket #453 left in connection 251 2014/09/15 11:00:13 [alert] 15749#0: *273001877 open socket #615 left in connection 307 2014/09/15 11:00:13 [alert] 15749#0: *213129612 open socket #329 left in connection 312 2014/09/15 11:00:13 [alert] 15749#0: *60552692 open socket #839 left in connection 347 

We got about 650 of these errors last time we reloaded. I don't remember ever seeing these errors before. Also it doesn't happen on every reload, only if nginx has been running longer (for now I know that it happens if it's running for at least 24h).

I don't know if this is of any help for you, but if I can help you with attaching a configuration file, or anything else feel free to ask.

Change History (18)

comment:1 by Valentin V. Bartenev, 11 years ago

Yes, please, provide a configuration file.

comment:2 by Strahinja Kustudić, 11 years ago

For some reason I cannot attach a file, so here it is:

user nginx; worker_processes 16; worker_rlimit_nofile 100000; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { worker_connections 10240; use epoll; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; #access_log /var/log/nginx/access.log main; access_log off; sendfile on; tcp_nopush on; keepalive_timeout 30; reset_timedout_connection on; tcp_nodelay on; server_tokens off; gzip on; gzip_http_version 1.1; gzip_vary on; gzip_comp_level 6; gzip_proxied any; gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; gzip_buffers 16 8k; charset utf-8; upstream php1 {	server unix:/dev/shm/.php1.socket; } upstream php2 {	server unix:/dev/shm/.php2.socket; } upstream php3 {	server unix:/dev/shm/.php3.socket; } ssl_certificate /etc/pki/tls/certs/site.com.crt; ssl_certificate_key /etc/pki/tls/certs/site.com.key; ssl_ciphers RC4-SHA:AES128-SHA:AES256-SHA:CAMELLIA128-SHA:HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s; limit_req_zone $uri zone=two:10m rate=200r/s; include /etc/nginx/conf.d/*.conf; } 

The rest of the files are virtual hosts, which have nothing special in them.

comment:3 by Valentin V. Bartenev, 11 years ago

Nothing special means that you don't use SPDY, right?

comment:4 by Strahinja Kustudić, 11 years ago

I forgot about SPDY, that is special and we do use it.

comment:5 by Valentin V. Bartenev, 11 years ago

What is about dynamic resolving? Do you have the resolver directive? There were a number of bugs in it found and fixed, that could result such problems.

Actually, what you think as "nothing special" can contain a lot of special.

Could you test with spdy turned off? Also, if you are using spdy, it's recommended to switch to the mainline branch.

Last edited 11 years ago by Valentin V. Bartenev (previous) (diff)

comment:6 by Strahinja Kustudić, 11 years ago

We don't use resolver anywhere.

By nothing special, we basically have a few HTTP, SSL and SPDY virtual hosts, which have a few locations with some IF statements as well, a few return 301 url, a few locations have a proxy_pass to an external web server and we have rate limiting on one location and that is basically it.

Regarding SPDY, why is it recommended to use mainline instead of stable?

Version 1, edited 11 years ago by Strahinja Kustudić (previous) (next) (diff)

comment:7 by Strahinja Kustudić, 11 years ago

Also I will try if the same thing will happen tomorrow when I reload and if it does, I will turn off SPDY and try the next day.

in reply to:  6 comment:8 by Valentin V. Bartenev, 11 years ago

Replying to Strahinja Kustudić <strahinjak@nordeus.eu>:

Regarding SPDY, why is it recommended to use mainline instead of stable?

Because the mainline branch receives more bugfixes, which is especially critical when you are using new/experimental functionality. See also: http://nginx.com/blog/nginx-1-6-1-7-released/

comment:9 by Strahinja Kustudić, 11 years ago

I wanted to ask about that blogpost, but I didn't want to derail this bug report. I have mixed feelings about it and it doesn't make that much sense tbh. You are practically saying that stable version has more bugs than the mainline version, even thought the mainline version is introducing new functionality and the stable one isn't. Why have a stable version in the first place, if it is less stable?

So SPDY is an experimental feature in the stable version? Right?

in reply to:  9 comment:10 by Valentin V. Bartenev, 11 years ago

Replying to Strahinja Kustudić <strahinjak@nordeus.eu>:

I wanted to ask about that blogpost, but I didn't want to derail this bug report. I have mixed feelings about it and it doesn't make that much sense tbh. You are practically saying that stable version has more bugs than the mainline version, even thought the mainline version is introducing new functionality and the stable one isn't. Why have a stable version in the first place, if it is less stable?

There're at least two goals for the stable branch:

  • Avoid changes in internal API, don't break 3rd-party modules;
  • Don't break anything: only critical fixes (usually for annoying bugs, that can't be avoided by configuration changes) and with some time lag after being fixed in the mainline branch.

The stable doesn't imply bug-free. And any bug in spdy can be avoided by turning it off.

So SPDY is an experimental feature in the stable version? Right?

The SPDY module is currently experimental regardless of version (as well as the protocol itself is experimental too).

comment:11 by Strahinja Kustudić, 11 years ago

I reloaded today again and the same thing happened. I disabled SPDY now and will try tomorrow what happens.

I also noticed one interesting thing, which might be nothing, but it is a strange configuration. We have a few SSL virtual hosts using the same wild-card SSL certificate, but only one virtual host was using SPDY.

comment:12 by Strahinja Kustudić, 11 years ago

Sorry for not getting back sooner, but you were right. It was an issue with SPDY. After I disabled it I didn't have any problems. For now I'm keeping it disabled, since we didn't notice any performance gain anyway.

comment:13 by Tomáš Kvasnička, 11 years ago

I can confirm this still exists at 1.7.10 with SPDY on and we are able to reproduce it (it happens every time when there are open connections using SPDY & HTTPS and we reload the configuration). Whenever you'll be ready to get deep into debugging this let me know and I can provide all the info you might need.

comment:14 by Maxim Dounin, 10 years ago

Keywords: spdy added

comment:15 by lowks@…, 10 years ago

I can confirm this bug still exists even with http2. We encountered using vanilla 1.9.6 and 1.9.7 nginx. with the following config. Steps to reproduce 1. curl a large file from the server 2. issue a reload during that curl session 3. Things cock up.:

#user nobody; worker_processes 1; #error_log logs/error.log; #error_log logs/error.log notice; #error_log logs/error.log info; #pid logs/nginx.pid; events { worker_connections 2048; } http { include mime.types; default_type application/octet-stream; sendfile on; keepalive_timeout 65; gzip on; # HTTPS server # server { listen 443 ssl http2; server_name localhost; ssl_certificate /home/lowks/temp_certs/myssl.crt; ssl_certificate_key /home/lowks/temp_certs/myssl.key; ssl_session_cache shared:SSL:1m; ssl_session_timeout 5m; ssl_ciphers HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; location / { root /home/lowks/Downloads; autoindex on; } } } 

Every time a reload is done, errors like these show up:

2015/12/02 11:36:17 [alert] 3608#0: *1 open socket #3 left in connection 2 2015/12/02 11:36:17 [alert] 3608#0: aborting 2015/12/02 11:37:39 [notice] 3627#0: signal process started 2015/12/02 11:37:46 [notice] 3629#0: signal process started 2015/12/02 11:37:46 [alert] 3628#0: *2 open socket #4 left in connection 2 2015/12/02 11:37:46 [alert] 3628#0: aborting 
Last edited 10 years ago by lowks@… (previous) (diff)

comment:16 by Valentin V. Bartenev, 10 years ago

Status: newaccepted

comment:17 by Valentin V. Bartenev, 10 years ago

Keywords: http2 added

comment:18 by Valentin Bartenev <vbart@…>, 10 years ago

Owner: set to Valentin Bartenev <vbart@…>
Resolution: fixed
Status: acceptedclosed

In 6412:4ba91a4c66a3/nginx:

HTTP/2: implemented per request timeouts (closes #626).

Previously, there were only three timeouts used globally for the whole HTTP/2
connection:

  1. Idle timeout for inactivity when there are no streams in processing (the "http2_idle_timeout" directive);
  1. Receive timeout for incomplete frames when there are no streams in processing (the "http2_recv_timeout" directive);
  1. Send timeout when there are frames waiting in the output queue (the "send_timeout" directive on a server level).

Reaching one of these timeouts leads to HTTP/2 connection close.

This left a number of scenarios when a connection can get stuck without any
processing and timeouts:

  1. A client has sent the headers block partially so nginx starts processing a new stream but cannot continue without the rest of HEADERS and/or CONTINUATION frames;
  1. When nginx waits for the request body;
  1. All streams are stuck on exhausted connection or stream windows.

The first idea that was rejected was to detect when the whole connection
gets stuck because of these situations and set the global receive timeout.
The disadvantage of such approach would be inconsistent behaviour in some
typical use cases. For example, if a user never replies to the browser's
question about where to save the downloaded file, the stream will be
eventually closed by a timeout. On the other hand, this will not happen
if there's some activity in other concurrent streams.

Now almost all the request timeouts work like in HTTP/1.x connections, so
the "client_header_timeout", "client_body_timeout", and "send_timeout" are
respected. These timeouts close the request.

The global timeouts work as before.

Previously, the c->write->delayed flag was abused to avoid setting timeouts on
stream events. Now, the "active" and "ready" flags are manipulated instead to
control the processing of individual streams.

Note: See TracTickets for help on using tickets.