Skip to content

Conversation

deliahu
Copy link
Member

@deliahu deliahu commented Dec 1, 2021

No description provided.

@deliahu deliahu requested a review from RobertLucian December 1, 2021 18:46
@deliahu deliahu changed the title Wait for zero in-flight requests before terminating proxy Wait for zero in-flight requests before terminating realtime proxy Dec 1, 2021
Copy link
Collaborator

@miguelvr miguelvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can achieve the same without adding an endpoint. You can simply wait for in-flight requests when the proxy gets the SIGTERM signal and before calling server.Shutdown()

@miguelvr
Copy link
Collaborator

miguelvr commented Dec 1, 2021

I think we can achieve the same without adding an endpoint. You can simply wait for in-flight requests when the proxy gets the SIGTERM signal and before calling server.Shutdown()

Ref to the code:

cortex/cmd/proxy/main.go

Lines 172 to 189 in 16ffd08

select {
case err = <-errCh:
exit(log, errors.Wrap(err, "failed to start proxy server"))
case <-sigint:
// We received an interrupt signal, shut down.
log.Info("Received TERM signal, handling a graceful shutdown...")
for name, server := range servers {
log.Infof("Shutting down %s server", name)
if err = server.Shutdown(context.Background()); err != nil {
// Error from closing listeners, or context timeout:
log.Warnw("HTTP server Shutdown Error", zap.Error(err))
telemetry.Error(errors.Wrap(err, "HTTP server Shutdown Error"))
}
}
log.Info("Shutdown complete, exiting...")
}
}

@deliahu
Copy link
Member Author

deliahu commented Dec 1, 2021

@miguelvr yeah, that would be cleaner!

I just tried adding it there, but it didn't work as we expected. So I dug into it, and realized that we weren't handling the TERM signal (just INT). After listening for TERM too, checking the breaker's in-flight count was no longer necessary, since server.Shutdown() seems to handle this automatically.

What are your thoughts on the diff now?

@tfriedel
Copy link

tfriedel commented Dec 1, 2021

Thanks for the PR! I built the images and tried my test (route which sleeps 1 sec). Contrary to the previous verion, deleting a pod did not result in 503s. Good job!

@deliahu
Copy link
Member Author

deliahu commented Dec 1, 2021

@tfriedel awesome, thanks for bringing this to our attention, and for confirming that the fix is working as intended!

I'm working on a separate PR now to support custom preStop commands.

@deliahu deliahu merged commit d0b29f3 into master Dec 2, 2021
@deliahu deliahu deleted the proxy-pre-stop branch December 2, 2021 19:17
deliahu added a commit that referenced this pull request Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants