Skip to content

uWSGI randomly resets TCP connections #130

@marcquark

Description

@marcquark

I stumbled across this problem when i wanted to "quickly" spin up a toy installation of Netbox at home. Using a slightly modified version of the example playbook for a single-host deployment, Netbox installed fine on Ubuntu 20.04 and it was accessible via browser. Since it was completely empty i figured i'd start out by importing the community device types using this fine tool.
However after a few requests (usually the 2nd or 3rd), the script would always abort with a "connection reset by peer" exception. I tcpdumped the traffic when it happened, and lo and behold, after a simple (and perfectly valid) GET request, the server would just kill the TCP connection without sending an HTTP response. The request was not showing up in requests.log, and there were no errors in application.log. Using uwsgitop i could also not find any signs of crashing workers or resource shortage (one of my first random attempts was to go from 2C / 2 GB RAM to 4C / 4 GB RAM for the VM).

This seems to be a uWSGI problem, i found lots of threads on GitHub and some other forums. The solutions vary wildly, sometimes it was a browser-specific bug, sometimes the author had control over the application and could change how they handled certain things and so on and so forth. So i started playing around with my uwsgi.ini and after some trial-and-error and more googling, i read this part of the uWSGI docs. Finally i arrived at the following combination of uWSGI options that got rid of the problem

http-keepalive=true http-auto-chunked=true add-header=Connection: Close 

I'd also like to add that i've successfully run Netbox using this playbook productively before, but never encountered that issue. It might have been on Ubuntu 18.04 though, i can't remember that. It could be that there's some bug in recent versions of Netbox that makes uWSGI exhibit this behaviour. Since their recommended application server is gunicorn, i doubt that anybody can be bothered to debug it though.

I'm looking for feedback as to how the role should be adjusted to account for this bug, which apparently doesn't affect everybody. Should the above settings be made the default, just in case? Then again, it's a bit of a whacky solution and most likely implies a performance hit. So it could also be put into the README as a hint.
Is it worth writing a test case? It should be fairly easy to provoke using requests or pynetbox and firing off a few requests against the API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions