16

I am running a spring boot application with docker swarm and I use postgres for database. When I run both of them as docker service, database connection fails consistently and randomly (as you can see on the timestamp) as the log says:

2017-10-26T17:14:15.200415747Z app-db.1.1ayo6h8ro1og@scw-c2964a | LOG: could not receive data from client: Connection reset by peer

2017-10-26T17:43:36.481718562Z app-db.1.1ayo6h8ro1og@scw-c2964a | LOG: could not receive data from client: Connection reset by peer

2017-10-26T17:43:56.954152654Z app-db.1.1ayo6h8ro1og@scw-c2964a | LOG: could not receive data from client: Connection reset by peer

2017-10-26T17:44:17.434171472Z app-db.1.1ayo6h8ro1og@scw-c2964a | LOG: could not receive data from client: Connection reset by peer

2017-10-26T17:49:04.154174253Z app-db.1.1ayo6h8ro1og@scw-c2964a | LOG: could not receive data from client: Connection reset by peer

I couldn't understand or discover the reason for this. I'd appreciate any ideas.

edit:

we realized that, when testing the application, it also throws error like this:

SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 937517ms

Thanks.

2 Answers 2

11

I've got the same error deploying Docker Swarm stack of Spring Boot app and PostgreSQL. After battling with this for about a week, I've figured out that issue was in firewall dropping connections between containers because of inactivity. Quick answer, run following cmd on linux machine:

sudo sysctl -w \ net.ipv4.tcp_keepalive_time=600 \ net.ipv4.tcp_keepalive_intvl=60 \ net.ipv4.tcp_keepalive_probes=3 

As well, I've included following tomcat connection pool properties:

tomcat: max-active: 10 initial-size: 5 max-idle: 8 min-idle: 5 test-on-borrow: true test-while-idle: true test-on-return: false test-on-connect: true validation-query: SELECT 1 validation-interval: 30000 max-wait: 30000 min-evictable-idle-time-millis: 60000 time-between-eviction-runs-millis: 5000 remove-abandoned: true remove-abandoned-timeout: 60 

Solution came from this blogpost: DEALING WITH NODENOTAVAILABLE EXCEPTIONS IN ELASTICSEARCH

3
  • I will try this as soon as possible.Thanks for your help! Commented Nov 27, 2017 at 16:34
  • hi, i tried the solution and i only applied the first part. it's been up since yesterday and not failed. i guess it works :) thanks a lot! Commented Nov 29, 2017 at 8:51
  • Containers running kernel 4.13 or later will no longer inherit tcp_keepalive_time from the host (source: success.docker.com/article/ipvs-connection-timeout-issue), so this approach will no longer work with newer containers. However, as of Docker 19.03 there is a sysctl option that can be supplied to services (e.g. in a compose file). This can be used to set the above flags directly in the containers without messing with the host. docs.docker.com/compose/compose-file/#sysctls Commented Dec 12, 2019 at 17:36
6

There is another way to prevent closing idle connection. The problem is related to default swarm service discovery which closes the idle connection after 15 minutes.
Explicit specified the dnsrr endpoint mode resolves the problem, e.g.:

version: '3.3' services: foo-service: image: example/foo-service:latest hostname: foo-service networks: - foo_network deploy: endpoint_mode: dnsrr # ... networks: foo_network: external: true driver: overlay 
1
  • Amazing. I haven't found anything about this behavior in Docker docs. How did you figure it out? Maybe, we should submit a PR into Docker docs? Commented Jan 30, 2021 at 17:19

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.